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(57) Abstract 



This invention relates to four chimeric genes, a first encoding a plant cystathionine 7-synthasc (CS), a second encoding feedback- 
insensitive aspaitokinase, which is operably linked to a plant chloroplast transit sequence, a third encoding bifunctional feedback-insensitive 
aspartc^ase-homosenne dehydrogenase (AK-HDH). which is operably linked to a plant chlorc^last transit sequence, and a fourth encoding 
a mcthioninc-rich protein, all operably linked to plant seed-specific regulattsiy sequences. Methods for their use to produce increased levels 
of methionine in ttie seeds of transformed plants are provided. 
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Tm-.E 

NUCLEIC ACID FRAGMENTS, CHIMERIC GENES 
AND METHODS FOR INCREASING THE METHIONINE 
CONTENT OF THE SEEDS OF PLANTS 
5 T1ECHNIC AL FIELD 

This invention relates to four chitneric genes, a first encoding a plant 
cystathionine 7-synthase (CS), a second mcoding fiecdhack-insensitive 
aspartokinase, which is operably linked to a plant chloroplast transit sequence^ a 
third encoding bifiinctional feedback-insensitive aspartokinase^iomoserine 
10 dehydrogenase (AK-HDH), which is operably linked to a plant chloroplast transit 
sequence, and a fourdi encoding a methionine-rich protein, all operably linked to 
plant seed-specific regulatory sequences. Methods for their use to produce 
increased levels of methionine in the seeds of transformed plants are provided 
BACKGROUND OF THE INVENTIQN 
15 Human food and animal feed derived from many grains are deficirat in the 

sulfur amino adds, methionine and cysteine, which are required in an animal diet 
In mm, the sulfiir amino acids are the diird most limiting amino acids, after lysine 
and tryptophan, for tfie dietary requirements of many animals. The use of soybean 
meal, which is rich in lysine and tryptophan, to supplement com in anmial feed is 
20 Umited by the low sulfur amino acid content of the legume. Thus, an increase in 
the sulfiir amino acid content of either com or soybean woidd improve the 
nutritional quality of the mixtures and reduce the need for further siqsplementation 
tiuough addition of more e3q)ensive methionine. 

Efforts to improve the sulfiir amino acid content of crops thr ough plant 
25 breeding have met with limited success on the laboratory scale and no success on 
the commercial scale. A mutant com line which had an elevated whole-kernel 
methionine concentration was isolated from com cells grown in culture by 
selecting for growth in the presmce of inhibitory concentratioiis of lysine plus 
tfirconine [Plullq>s ct al. (1985) Cereal Chem. 62:213-218]. However, 
30 agronomically-acceptable cultivars have not yet been derived from diis line and 
commercialized. Soybean ceU lines with increased intracellular concentrations of 
methionine were isolated by selection for growth in the presence of ediionine 
[Madison and Thompson (1988) Plant Cell Reports 7:472-476], but plants were 
not regenerated from these lines. 
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The amino acid content of seeds is detemiined primarily by the storage 
proteins which are sjrnthesized during seed development and which serve as a 
major nutrient reserve following germination. The quantity of protein in seeds 
varies from about 10% of the dry weight in cereals to 20-40% of the dry weight of ^ 
5 legumes. In many seeds the storage proteins account for 50% or more of the total 

protein. Because of their abundance, plant seed storage proteins were among die k 
first proteins to be isolated Only recently, however, have the amino acid 
sequences of some of these jnoteins been determined with the use of molecular 
genetic techniques. These techniques have also provided information about die 

10 genetic signals that control die seed-specific expression and the intracellular 
tat;geting of these proteins. 

One genetic engineeiirig s^roach to increase the sulfur amino acid content 
of seeds is to isolate genes coding for proteins that are rich in the sulfiir^ontaining 
amino acids mediionine and cysteine, to link die genes to strong seed-specific 

15 regulatory sequences, to transform the chimeric gene into crops plants and to 

identify transfbrmants wherein the gene is sufficiendy-highly expressed to cause an 
increase in total sulfur amino acid content. However, increasing die sulfur amino 
acid content of seeds by esqiression of sulfur-rich proteins may be limited by die 
ability of the plant to synthesize methionine, by the synthesis and stability of die 

20 metiiionine-rich protein, and by efiGects of over-accumulation of the metfaionine- 
rich protein on the viability of the transgenic seeds. 

An alternative i^roadi would be to increase the production and 
accumulation of the free amino add, methionine, via genetic engineering 
technology. However, litde guidance is available on the control of the biosynthesis 

25 and metabolism of methionine in plants, particulariy in die seeds of plants. 

Methionine, along with threonine, lysine and isoleucine, are amino acids 
derived from aspartate. The first step in die patiiway is the phosphorylation of 
aspartate by the enzyme aspartokinase (AK), and this enzyme has been found to be 
an imi>ortant target for regulation of the pathway in many organisms. The 

30 aspartate family pathway is also believed to be regulated at the branch-point 

reactions. For methionine the reduction of aspartyl ^semialdefayde by homoserine 
dehydrogenase (HDH) may be an important point of control. The first committed 
step to methionine, the production of cystathionine from 0-phosj*iohomoserine 
and cysteine by cystathionine y-synthase (CS), ^>pears to be the primary point of 
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control of flux through the methionine pathway [GiovancUi el al, (1984) Plant 
Physiol. 77:450^55]. 

Before the present invention, no plant gene encoding CS was available for 
use in genetically mgineenng the methionine biosynthetic pathway. The present 
5 invention provides chimeric CS genes for seed-specific over-expression of the 
plant enzyme. Combinations of these gmes with other chimeric genes encoding 
AK or AK-HDH and methionine-rich seed storage protein provide methods to 
increase the level of methionine in seeds. 

SUMMARY OF THE INVENTrON 
10 Disclosed herein are four chimeric genes, a first encoding a plant 

cystathionine y-synthase (CS), a second encoding lysine-insensitive aspartokinase 
(AK), which is operably linked to a plant cMorc^last transit sequence, a fouxth 
encoding bifunctional feedback^insensitive aspaitokinase-homosertne 
dehydrogenase (AK-HDH), which is operably linked to a plant chloroplast transit 
IS sequence, and a fourth encoding a methionine-ridi protein, all chimeric genes 
operably linked to plant seed-specific regulatory sequoices. 

The invention indudes an isolated nucleic add fragment encoding a com 
cystatfaioiiine y-synthase. 

Also included herdn is an isolated nudeic add fi:agmem comprising: 
20 (a) a first nudeic acid fi:agment encoding a plant C3rstathionine 

y-synthase; and 

(b) a second nucleic add fragment encoding aspartokinase 
wiiidi is insensitive to end-product inhibition. Also disclosed is this isolated 
firagment wherein either die first nudeic add firagment is derived fiom com or 
25 wherein the second nucleic acid fragment comprises a nucleotide sequence 

essentially similar to the sequence shown in SEQ ID NO:4 encoding £. ppli AKDI, 
said nucleic acid fragment encoding a lysine-insensitive variant of E. coli AKm 
and further characterized in that at least one of the following conditions is met: 

(1) the amino acid at position 318 is an amino acid other 
30 than threonine, or 

(2) the amino acid at position 352 is an amino acid other 
than methionine. 

Further disclosed herein is an isolated nucleic acid fragment comprising 
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(a) a first nucleic acid fragment encoding a plant cystathionine 

y-synthase and 

(b) a second nucleic acid fragment encoding a bi-fimctional 

protein with aspartokinase and homoserine dehydrogenase activities, both of which ^ 
5 are insensitive to end-product inhibition. In one embodiment of tfiis invention, this 

nucleic acid fragment has a first nucleic acid fragment derived from com and in ^ 
another the second nucleic acid fragmmt conq)rises a nucleotide sequence 
essentially similar to the E. coIimetL gene. 

Also disclosed is a nucleic acid fragment comprising a first chimeric gene 

10 wherein a nucleic acid fragment encoding a plant cystathicmine y-syntfaase is 

operably linked to a seed-specific regulatory sequence and a second chimeric gme 
wherein a nucleic acid fi^gment encoding aspartokinase, whidi is insmsitive to 
end-iHY>duct inhibition, is opcTeibly linked to a plant chloroplast transit sequence 
and to a seed-specific regulatory sequence, lliis invention includes also Includes 

IS another nucleic acid fragmoit conqnising tfiis same first chimeric gene and a 
second diimmc gene whmin a nucleic acid fragment encoding a bi-fiinctianal 
protein with aspartokinase and homoserine dehydrogenase activities, both of which 
are insensitive to end-produa inhibition, is operably linked to a plant chloroplast 
transit sequence and to a seed-specific regulatory sequence. 

20 The invention also includes plants comprising in their genomes any of the 

tb& fragments or constmcts herein described and their seeds. 

The invention further includes a method frir increasing the methionine 
content of plant seeds comprising: 

(a) transforming plant cells with a first chimeric gene wherein a 
25 nucleic acid fragment encoding a plant cystathionine y-synthase is operably linked 

to a seed-specific regulatory sequence; 

(b) growing fertile mature plants from the transformed plant 
cells obtained from step (a) under conditions suitable to obtain seeds and 

(c) selecting from the progeny seed of step (b) those seeds 

30 containing increased levels of methionine compared to untransformed seeds* The 
invmtion also includes tranforming plant cells in step (a) with a nucleic acid 
fragment with the same first chimeric gene and a second chimeric gene wherein a 
nucleic acid encoding s^anokinase which is insensitive to end-product inhibition is 
operably linked to a plant chloroplast sequence and to a seed-specific regulatory 
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sequence or transfonning plant cells in step (a) with a nucleic acid fragment having 
tfie same first chimeric gene but also having a second chimeric gene wheiein a 
nucleic acid fragment encoding a bi-functional protein with aspaitokinase and 
homoserine dehydrogenase activities, both of which are insensitive to end-product 
5 inhibition, is operably linked to a plant chloroplast transit sequence and to a seed- 
specific regulatory sequence. 

The invetition includes plants and seeds having in their genomes any of the 
previously described first and second chimeric genes and a third chimeric gene 
lA^rein a nucleic acid fragment encoding a methionine-rich protein, \^erein the 

10 weight percent methionine is at least 15%, is operably linked to a seed-specific 
regulatory sequence. Also disclosed is a nucleic acid fragment having the same 
finest, second, and third chimeric genes. Also disclosed is a method for increasing 
the methionine content of the seeds of plants comprising transforming plant cells 
with this nucleic acid fragment; (b) growing fieitiie mature plants from the 

IS transfonned plant cells obtained from step (a) under conditions suitable to obtain 
seeds; and (c) selecting from die progeny seed of step (b) those seeds containing 
increased levels of methionine compared to untransfonned seeds. 

Further disclosed herein is a cdiimeric gene wherein the nucleic acid 
firagment described on page 3, starting at line 19, is operably linked to a regulatory 

20 sequence citable of expression in microbial cells. Also disclosed is a method for 
producirig plant cystadiionine ganuna synthase comprising: 

(a) transforming a microbial host cell with that cdiimcric gene; 

(b) growing the transformed microbial cells obtained from 
step (a) under conditions that result in the egression of plant cystathionine gamma 

25 synthase protein. 

BRIEF DESCRIPTION OF THE 
DRAWINGS AND SEQUENCE DESCRIPTIONS 
The invration can be more fiilly understood from the following detailed 
description and the accompanying drawings and die sequence descriptions whidi 
30 form a part of this application. 

Figure 1 shows a comparison of the amino acid sequences of pan of the 
com CS and E. coli CS proteins. 

Figure 2 shows a com CS genomic DNA fragment, including 5' non-coding 
region, exons and introns. The rmcleotide sequence and corresponding amino acid 
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of the first cxon is shown and a DNA segment that is deleted in a com CS cDNA 
fragment is indicated. 

SEQ ID NO; 1 shows the nucleotide sequence of a com CS cDNA and the 
corresponding amino acid sequence of the com CS protein, described in 
5 Example 1. 

SEQ ID NOS:2 and 3 show oligonucleotides used to add a translation 
initiation codon to the com CS gene. 

SEQ ID NO:4 shows the nucleotide and amino acid sequence of the coding 
region of the wild type E. coli IvsC gene, which encodes AKTTT, described in 
10 Example 3. 

SEQ ID NOS:5 and 6 were used in Example 3 to create an Nco I site at 
the translation start codon of the coli IvsC gene, 

SEQ ID NOS:7 and 8 were used in Example 4 to screen a com library for a 
high methicmine 10 kD zein gene. 
IS SEQ ID NO:9 diows die nucleotide sequence (2123 bp) of the com HSZ 

gene and the predicted amino acid sequence of the primary translation product. 
Nucleotides 753-755 are the putative tzansladon initiation codon and nucleotides 
1386-1388 arc the putative translation temiination codon. Nucleotides 1-752 and 
1389-2123 include putative 5' and 3* legulatoty sequences, respectively. 
20 SEQ ID NOS:10 and 11 were used in Example 5 to modify the HSZ gene 

by in vitro mutagenesis. 

SEQ ID NO:12 shows a 635 bp DNA fragment including the HSZ coding 
region only, ^(iiicdi can be isolated by restriction endonudease digestion using 
Nco I (5'-CCATGG) to Xba I (5'-TCTAGA). Two Nco I sites that were present 
25 in the native HSZ coding region were eliminated by site-directed mutagenesis, 
without changing die encoded amino acid sequence. 

SEQ ID NOS:I3 and 14 were used in Example 5 to create a fonn of the 
HSZ gene witfi alternative unique restriction endonudease sites. 

SEQ ID NOS:15 and 16 were used in Example 5 to create a gene to code 
30 for the mature form of HSZ. 

SEQ ID NO: 17 shows a 579 bp DNA fragment including the coding region 
of the mature HSZ protein only, which can be isolated by restriction endonudease 
digestion using BspH I (5*-TCATGA) to Xba I (5 -TCTAGA). Two Nco I sites 
that were present in the native HSZ coding region were eliminated by site-directed 



wo 95/3 1554 ^rfCTAJS95/05545 



mutagenesis. This was accomplished without changing the encoded amino acid 
sequence. 

SEQ ID NOS: 18-23 were used in Example 6 to create a com chloroplast 
transit sequence and link the sequence to the E. coli lvsC -M4 gene. 
5 SEQ ID NOS:24-25 were used in Example 7 as PGR primeis to isolate and 

modify the £^ £{2|i metL gene. 

SEQ ID NO:26 diows the nucleotide sequence and a 3639 bp Xba I com 
genomic DNA iEragment encoding two-thirds of the com CS protein and including 
806 bp upstream from the protein coding region as described in Example 1. 
10 SEQ ID NO:27 shows the complete amino acid sequence of the com CS 

protein deduced firom the com cDNA genomic DNA fragment of SEQ ID NO: I 
and die com genomic DNA fragment of SEQ ID NO:26. 

The Sequence Descriptions contain the one letter code for nucleotide 
sequence characters and the three letter codes for amino acids as defined in 
1 5 confbmity with the lUP AC-IYUB standards described in Nucleic Acids Research 
13:3021-3030(1985) and in the Biochemical Journal 219 (No. 2):34S-373(1984) 
which arc mcorporated by lefnence herein. 

nFT ATT nR<;rRT PnoN OF the invention 

The teachings below describe nucleic add fragments, chimeric genes and 
20 procedmes useful for increasing Ae accumulation of methionine in the seeds of 
transfonned plants, as compared to levels of methionine in untransformed plants. 

In the context of this disclosure, a number of terms shall be utilised. As 
used herein, the temi "nucleic acid** refers to a large molecule which can be single- 
stranded or double-stranded, composed of monomers (nucleotides) containing a 
25 sugar, j^sphate and either a pnirine or pyrimidine. A **nucleic acid fragment" is a 
fraction of a given nucleic acid molecule. In higher plants, deoxyribonucleic acid 
(DNA) is die genetic material while ribonucleic acid (RNA) is involved in the 
transfer of die information in DNA into proteins. A "genome" is the entire body of 
genetic material contained in each cell of an organism. The temi "nucleotide 
30 sequence" refers to a polymer of DNA or RNA which can be single- or double- 
stranded, optionally containing synthetic, non-natural or altered nucleotide bases 
enable of incorporation into DNA or RNA polymers. 

As used herein, "essoitiaUy similar" refers to DNA sequences that may 
involve base changes that do not cause a change in the encoded amino acid, or 
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which involve base changes which may alter one or more amino acids, but do not 
affect the functional properties of the protein encoded by the DNA sequence. It is 
therefore understood that the invention encompasses more than the specific 
exemplary sequences. Modifications to the sequence, such as deletions, insertions, 
5 or substitutions in the sequence which produce silent changes that do not 

substantially affect die functional properties of the resulting protein molecule are 
also contemplated. For example, alteration in the gene sequence which reflect the 
degeneracy of the genetic code, or which result in the production of a dienucally 
equivalent amino add at a gr^en site, are contemplated; thus, a codon for the 

10 amino acid alanine, a hydrophobic amino acid, may be substituted by a codon 

encoding another less hydrophobic residue, such as glycine, or a more hydrophobic 
residue, such as valine, leucine, or isoleucine. Similaily , changes vdiich result in 
substitution of one negatively charged residue for another, such as aspartic acid for 
glutamic acid, or one positively charged residue for another, such as lysine for 

1 5 arginine, can also be e^qiected to produce a biologically equivalent product. 
Nucleotide changes which result in alt^ation of die N-tenninal and C*lerminal 
portions of the protein molecule would also not be e^q^ected to alter the activi^ of 
the protein. In some cases, it may in fiact be desirable to make motants of Ac 
sequence in order to study Ae ^fea of alteration on the biological activity of tiie 

20 protein. Each of the proposed modifications is well within the routine skill in die 
art, as is determination of retention of biological activity of the encoded products. 
Moreover, the skilled artisan recognizes that "essentially sinuDtar" sequences 
encompassed by this invention are also defined by tiieir ability to hybridize, under 
stringent conditions (O.IX SSC, 0.1% SDS, 65°C), with the sequences exemplified 

25 herein. 

*'Gene" refers to a nucleic acid fragment that expresses a specific protein, 
including regulatory sequences preceding (5' non-coding) and following (3* non- 
coding) die coding region. "Native" gene refers to the gene as found in nature 
widi its own regulatory sequences. "Chimeric" gene refers to a gene comprising 
30 heterogeneous regulatory and coding sequences. "Endogenous" gene refers to the 
native gene nonnally found in its natural location in the genome. A "foreign" gme 
refers to a gene not normally found in the host organism but that is introduced by 
gene transfer. 
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"Coding sequence" refers to a DNA sequence that codes for a specific 
protein and excludes the non-coding sequences. 

'Initiation codon" and "termination codon" refer to a unit of three adjacent 
nucleotides in a coding sequence that specifies initiation and chain teimination, 
5 respectively, of protein synthesis (mRNA translation). "Open reading frame" 
refers to the amino acid sequence encoded betwcra translation initiation and 
termination codons of a coding sequence. 

"RNA transcrq>t" refers to the product resulting from RNA polymerase- 
catalyzed transcrq>tion of a DNA sequence. When the RNA transcrqn is a perfect 
10 complemmtaiy copy of tfie DNA sequence, it is rcfened to as the primary 
transcrq>t or it may be a RNA sequence derived from posttranscrq^tional 
processing of the primary transcrq>t "Messenger RNA (mRNA) refers to RNA 
that can be translated into protein by the cell. "cDNA" refers to a double-stranded 
DNA one strand of which is complementary to and derived &om mRNA by reverse 
IS transcrqption. "Sense** RNA refers to RNA transcript that includes the mRNA. 

As used herein, *'regulatory sequences** refer to nucleotide sequmces 
located upstream (5*), widiin, and/or downstream (3*) to a coding sequence, wfaidi 
control the transcription and/or expression of the coding sequences, potentially in 
conjunction with the protein biosynthetic qyparatus of the cell. These regulatory 
20 sequexKres include promoters, translation leader sequences, transcription 
termination sequences, and polyadenylation sequences. 

**Ptomoter'* refi^ to a DNA sequence in a gene, usually upstream (5*) to its 
coding sequence, i^di controls the expression of die coding sequence by 
providing die recognition for RNA polymerase and other factors required for 
25 proper transcrqidon, A promoter may also contain DNA sequences diat are 
involved in the binding of protein factors which control the efiecdveness of 
transcripdon initiadon in response to physiological or developmental condidons. It 
may also contain enhancer elements. 

An ''enhancer** is a DNA sequence which can stimulate promoter activity. 
30 It may be an innate element of the promoter or a heterologous element inserted to 
enhance die level and/or tissue-specificity of a promoter, *'Constitudve promoters" 
refers to those that direct gene expression in all tissues and at all times. "Organ- 
specific" or "development-specific** promoters as refened to herein are diose that 
direct gene e3q>ression almost exclusively in specific organs, such as leaves or 
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seeds, or at specific development stages in an organ, such as in early or late 
embryogenesis, respectively. 

The term "operably linked*' refers to nucleic acid sequences on a single 
nucleic acid molecule which are associated so that the function of one is affected 
5 by the other. For example, a promoter is operably linked with a structural gene 
(i.e., a gene encoding aspartokinase that is lysine-insensitive as given herein) when 
it is capable of affecting the e}q)ression of that structural gene (i.e., that the 
structural gene is under the transcriptional control of the promoter). 

10 The term "expression", as used herein, is intended to mean the production 

of the protdn product encoded by a gene. More particulariy, "e?q3ression" ref«-s 
to the transcription and stable accumulation of the sense (mRN A) or antisense 
RNA derived fi-om the nucleic add fragment(s) of the invention that, in conjuction 
with the protein apparatus of the cell, results in altered levels of protein product. 

15 *' Antisense inhibition" refers to the production of antisense RNA transcripts 

capable of preventing the expression of the target protein. "Overexpression" refers 
to the production of a gene product in transgenic organisms that exceeds levels of 
production in normal or non-transformed organisms. ** Altered levels" refo-s to the 
production of gene product(s) in transgenic organisms in amounts or proportions 

20 that differ fi-om that of normal or non-transformed organisms. 

The "3* non-coding sequences" refers to the DNA sequence portion of a 
gene that contains a polyadeiQrlation signal and any other regulatory signal capable 
of affecting mRNA processing or gene expression. The polyadenylation signal is 
usually characterized by affecting the addition of polyadenylic acid tracts to the 3' 

25 end of the mRNA precursor. 

The "translation leader sequence" refers to that DNA sequence portion of a 
gene between the promoter and coding sequence that is transcribed into RNA and 
is present in the fiilly processed mRNA upstream (5*) of the translation start codon. 
The translation leader sequence may affect processing of the primary transcript to 

30 mRNA, mRNA stability or translation efficiency. 

"Mature" protein refers to a post-translationally processed polypeptide 
without its targeting signal. 'Trecursor" protein refers to the primary product of 
translation of mRNA. A "chloroplast targeting signal" is an amino acid sequence 
which is translated in conjunction with a protein and directs it to the chloroplast. 
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"Chloroplast transit sequence" refers to a nucleotide sequence that encodes a 
chloroplast targeting signal. 

"End-product inhibition" or "feedback inhibition" refers to a biological 
regulatory mechanism wherein the catalytic activity of an enzyme in a biosyndietic 
5 pathway is reversibly reduced by binding to one or more of the end-products of the 
pathway whoi-the concentration of die end-pn>duct(s) reaches a sufficiently high 
level, thus slowing tiie biosyntfaetic process and preventing over-accumulation of 
the end^roduct. 

'Transformation" herein refers to the transfer of a foreign gene into the 
10 genome of a host organism and its genetically stable inheritance. Examples of 
methods of plant transfoimation include A-grobacterium -mediated transfonnation 
and particle-accelerated or "gene gun" transfoimation technology. 

"Host cell" means the cell that is transformed with the introduced genetic 
material. 

15 Isolation of a Plant CS Gene 

In order to increase the accumulation of firee methionine in tiie seeds of 
plants via genetic engineering, a gate encoding cystathionine y-^ynthase (CS) was 
isolated from a plant for ttie first time. CS catalyzes the first reaction wherein 
cellular metabolites are committed to tfie synthesis of methkmiuLand has heca_ 

20 implicated to play a key role in the regulation of methionine biosynthesis. 
Regulation is not adueved Arough feedback inhibidon of CS by at^ of the 
pathway end-products [Thompson ct al. (1982) Plant Physiol. 69:1077-1083], 
however. Thus over-e;qnession of CS is expected to increase flax tliiou^ the 
methionine branch of the biosynthetic padiway , even when high levels of 

25 methionine are accumulated. 

The availability of a plant CS gene is critical. Although bacteiial CS genes, 
such as the E. odi mctB gene [Duchange et al. (1983) J. Biol. Chmi. 
258:14868-14871], have been isolated, bacterial CS uses O-succinylhomoseiine as 
a substrate, and has little or no activity with O-phosphorylhcmioserine, the 

30 frfiysiological precursor of methionine in plants [Datko et al. (1974) J. Biol. Chem. 
249: 1 1 39-1 1 55]. Since plants lack homoseiine transsnccinylase and thus do not 
produce O-succuiylhomoserine, the bacterial genes would have little utility in 
plants. 
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We teach that a plant CS gene can be isolated by complementation of an 
E. coli host strain bearing a metB mutation. Such a strain requires methionine for 
growth due to inactivation of the E. coli gene that encodes CS. Functional 
expression of the plant CS gene allows the strain to grow in the absence of 
5 methionine. A plant cDNA library is constrocted in a suitable £. coli expression 
vector, introduced into the g. coli host, and clones able to grow in the absence of 
methionine are selected. The use of this ^>proach to isolate a com CS cDNA gene 
is presented in detail in Example 1 . The nucleotide sequence of a com CS cDNA 
is provided in SEQ ID NO:I. CS genes from other plants could be similarly 
10 isolated by functional complementation of an coli metB mutation. Alternatively, 
other plant CS genes, either as cDNAs or genomic DNAs, ccnild be isolated by 
using the com CS gene as a DNA hybridization probe. In Example 1 we 
demonstrate the isolation of a com genomic DNA -fragment, shown in SEQ ID 
NO:26. 

1 5 Nucleic acid fragments carrying plant CS gmes can be used to produce the 

plant CS protein in heterologous host cells. The plant CS protein so produced can 
be used to prepare antibodies to tfie protein by metfiods well-known to those 
skilled in die art The antibodies are useful for cletecting plant CS protein in situ in 
plant cells or in vivo in plant cell ex trac ts . Additionally, die plant CS protein can 

20 be used as a target to design and/or idoitify inhibitors of die enzyme that may be 
useful as herbicides. This is desirable because CS represents a rate-limiting 
enzyme in an essential biodiemical pathway. Furdiennore, inhibition of 
mediionine biosynthesis may have additional pleiotropic effects, since mediionine is 
metabolized to S-a&nosyl-methionine, which is used in many important cellular 

25 processes. Plrefened heterologous host cells for production of plant CS protein 
are microbial hosts. Microbial expression systems and expression vectors 
containing regulatory sequences diat direct high level e^qnession of foreign 
proteins are well known to those skilled in the art Ariy of these could be used to 
construct chimeric genes for production of plant CS. These chimeric genes could 

30 then be introduced into s^ropriate microorganisms via transformadon to provide 
high level expression of pLsnt CS. An example of high level expression of plant CS 
in a bacterial host is provided (Example 2). 
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Isolation of AK Genes 
Over-expression of feedback-insensitive AK increases flux through the 
entire pathway of aspartate-derived amino acids even in die presence of high 
concentrations of the pathway end-products lysine, threonine and methionine. 
S This increased flux provides more substrate for CS and increases the potential for 
m^iionine over-accumulaticm. 

Provided herein is a unique nucleic acid fragment wherein a CS chimeric 
gene is linked to a chimeric gene for AK, which is insensitive to feedback- 
inhibition by end-products of the biosyntfietic pathway. Also provided is a tmique 
10 nucleic acid fragment wherein a CS chimeric gene is linked to a chimeric gene for a 
iH-fimcdonal enzyme, AK-HDH, both activities of which are insensitive to 
feedback-inhibitian by end-products of the biosyntfaetic pathway. Over-«3cpre^on 
of feedback-insensitive AK-HDH directs the increased flux duough tt^ 
methionine-threonine branch of the aspartate-derived amino acid pathway, further 
1 S increasing the potential tor methionine and threonine biosyndiesis. 

Anumber of AK and AK-HDH genes have been isolated and sequenced. 
Tliese indude the ArA gme of E. cqU (Katinka et al. (1980) Proc. Nad. Acad. ScL 
USA 77:5730-5733], the metL gene of E. £sM (Zakin et al. (1983) J. Biol. Oiem. 
258:3028-3031], Ac fcsC gene of 1. cdi [Cassan et al. (1986) I. Biol. Chem. 
20 261:1052-1057], and die HOND gene of §. cerevisiae [Rafalsfci et al. (1988) J. 
Biol. Chem. 263:2146-2151]. The thrA gene of E. coU encodes a bifunctiosud 
protein, AKI-HDHI. The AK activity of this canzyme is inhibited by threonine. 
The metL gene of E. also encodes a bifunctional proteiti, AKD-HDHH, and 
the AK activity of this enz3rme is insensitive to all pathway end^roducts. The 
25 £. coli lysC gene encodes AKTTT, whic^i is sensitive to lysine inhibidon. The 
HOM3 gene of yeast encodes an AK which is sensitive to dureonine. 

As indicated above AK genes are readily available to one skilled in the art 
for use in the preseiu invention. A preferred class of AK genes encoding 
feedback-insensitive enzymes are derived from the B. coli lyi^ gene. Procedures 
30 useful for the isolation of the wild type E. £gli lysC gene and lysine-insensitive 
mutations are presented in detail in Example 3. 

The sequences of three mutant IvsC genes that encode lysine-insensidve 
aspanokinase each differ from the wild type sequence by a single nucleotide, 
resulting in a single amino acid substitution in the protein. Other mutations could 
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be generated at these target sites (see Example 3) m vitro by site-directed 
mutagenesis, using methods known to diose skilled in the art. Such mutations 
would be expected to result in a lysine-insensitive enzyme. Furthemaore, the 
11 vivo method described in Exampie 3 could be used to easily isolate and 
5 characterize as many additional mutant IvsC genes encoding lysine-insensitive 
AKm as desiicd. 

Anotficr preferred class of AK genes are those encoding bi-fimctional 
enzymes, AK-HDH, wherein bodi catalytic activities are insensitive to end-product 
inhibidon. A preferred AK-HDH razyme is E. coli AKH-HDHII encoded by the 

10 metL gene. As indicated above, ttiis gene has been isolated and sequenced 

previously. Thus, it can be easily obtained for use in the present invention by the 
same method used to obtain die lysC gene described in Example 3. Alternatively, 
the gene can be isolated firom E. coli genomic DNA via PCR using oligonucleotide 
primers, ^^ch can be designed based on the published DNA sequence, as 

15 described in Example 7. 

In addition to these genes, several plant genes encoding lysine-insoisitive 
AK are known. In barley, Ijrsine plus tfareonirw-resistant mutants bearing 
mutations in two unlinked genes that result in two dififerent lysine-insmsitive AK 
isoenzymes have been described [Bright et al. (1982) Nature 299:278-279, Rognes 

20 et al. (1983) Planta 157:32-38, Amida ct al. (1984) Plant Phsiol. 76:442-446]. In 
com, aljrsine plus direonine-rcsistant cell line had AK activity that was less 
srasitive to lysine inhibition dian its parmt line [Hibberd et al. (1980) Planta 
148: 1 83-1 87] . A subsequendy isolated lysine plus threoninenresistant com mutant 
is altered at a different genetic locus and also produces lysine-insensitive AK 

25 Piedrick ct al. (1990) Theor. AppL Genet. 79:209-215, Dotson ct al. (1990) 

Planta 182:546-552]. In mbacco there are two AK enzymes in leaves, one lysine- 
sensitive and one threonine-sensitive. A lysine plus tfareonine-resistant tobacco 
mutant that expressed completely lysine-insensitive AK has been described 
[Frankard ct al. (1991) Theor. Appl. Genet. 82:273-282]. These plant mutants 

30 could serve as sources of genes encoding lysine-insensidve AK and used, based on 
the teachings herein, to increase the accumulation of methionine in the seeds of 
transformed plants. 

A partial amino acid sequence of AK from carrot has been imported 
[Wilson et al. (1991) Plant Physiol. 97:1323:1328]. Using this information a set of 
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degenerate DNA oligonucleotides could be designed, synthesized and used as 
hybridization probes to permit the isolation of the carrot AK gene. Recenfly the 
carrot AK gene has been isolated and its nucleotide sequence has been determined 
[Matthews et al. (1991) U,S.S.N. 07/746,705]. This gene was used as a 
5 heterologous hybridization probe to isolate the Arabidopsis thaliana AK-HDH 
gene [Ghislain et al. (1994) Plant Mol. BioL 24:835-851], and thus can be used as 
a heterologous hybridization probe to isolate the plant genes encoding lysine- 
insensitive AK or AK-HDH described above. 

Construction of Chimeric Genes for Expression of 

10 CS and AK in the Seeds of Plants 

In order to increase biosynAesis of methionine in seeds, suitable regulatory 
sequences are provided to create chimeric genes for hi^ level seed-specific 
expression of the CS and AK or AK-HDH coding regions. The replacement of the 
native regulatory sequences acccnnpiishes three diings: 1 ) any methionine- 

15 concentration-dependent regulatory sequences are removed, permitting 

biosynthesis to continue in the presmce of higji levels of free methionine, 2) any 
pieiotropic efifects that the accumulation of excess firee methionine might have on 
the vegetative growth of plants is prevented because the chimeric gene(s) is not 
expressed in vegetative tissue of the transformed plants 3) higji level «q>ression of 

20 the enzyme(s) is obtained in die seeds. 

The eiq>ression of foreign genes in plants is well-established [De Blaere et 
al. (1987) Metfa. Enzymol. 143:277-291]. Proper level of expression of CS and 
AK or AK-HDH mRNAs may require the use of different chimeric genes utilizing 
different promoters. Such chimeric gems can be transferred into host plants either 

25 togetlier in a single expression vector or sequentially using more than one vector. 
A preferred class of heterologous hosts for the expression of CS and AK or 
AK-HDH genes are eukaryotic hosts, particularly the cells of higher plants. 
Particularly preferred among the higher plants and the seeds derived firom them are 
soybean, rapeseed (Brassica ruyus. B. campestrisV sunflower rHftHanthns anmisV 

30 cotton (Gossypimn hirsutum) . com, tobacco (Nicotiana Tubacum ), alfalfa 

(Medicago sativa) . wheat (TritLcum sp), barley (Hordeum vulgare) . oats (Avena 
sativa . L), sorghum (Sorghum bicolor ), rice (Oryza sativa ), and forage grasses. 
E7^>ression in plants will use regulatory sequences functional in such plants. 
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The origin of the promoter chosen to drive the expression of the coding 
sequence is not critical as long as it has sufficient transcrq)tional activity to 
accomplish the invention by expressing translatable mRNA for CS and AK or 
AK-HDH genes in the desired host tissue. 
5 Preferred promoters are tfiose that allow expression of die protein 

specifically in seeds. This may be especially useful, since seeds are the primary 
source of vegetable amino acids and also since seed-specific e^qnession will avoid 
any potential deleterious effect in non-seed organs. Examples of seed-specific 
promoters include, but are not limited to, the promoters of seed storage proteins. 

10 The seed storage proteins are stricdy regulated, being expressed almost exclusively 
in seeds in a highly organ-specific and stage-specific manner [Higgins et al.(1984) 
Ann. Rev. Plant Physiol. 35:191-221; Goldberg et al.(1989) CeU 56:149-160; 
Thompson et al. (1989) BioEssays 10: 108-1 13]. Moreover, different seed storage 
proteins may be expressed at difiEerent stages of seed development 

15 There are currently numerous examples for seed-specific expression of 

seed storage protein gmes in transgenic dicotyledonous plants. These include 
genes fiom dicotyledonous plants for bean p^haseolin [Sengiq>ta-Goplalan et al. 
(1985) Pioc. Nafl, Acad. ScL USA 82:3320-3324; Hoffinan et al. (1988) Plant 
MoL Biol. 11:717-729], bean lectin [Voelkcr et al. (1987) EMBO J. 6: 

20 3571-3577], soybean lectin [Okamuro et al. (1986) Proc. Natl. Acad. Sci. USA 
83:8240-8244], soybean kunitz trypsin inhibitor [Pferez-Grau et al. (1989) Plant 
Cdl 1:095-1 109], soybean p-conglycinin ^eacdiy ct al. (1985) EMBO J. 
4:3047-3053; Baricer et al. (1988) Proc, Nad. Acad. Sci. USA 85:458-462; CJien 
et al. (1988) EMBO J. 7:297-302; Chen et al. (1989) Dev. Genet. 10:112-122; 

25 Naito et al. (1988) Plant Mol. Biol. 11:109-123], pea vicilin [Higgins et al. (1988) 
Plant Mol. Biol. 1 1:683-695], pea convicilin (Newbigin ct al. (1990) Planta 
180:461], pea legumin [Shirsat et al. (1989) Mol. Gen. Genetics 215:326]; 
rapeseed napin [Radke et al. (1988) Theor. Appl. Goiet. 75:685-694] as well as 
genes from monocotyledonous plants such as for maize 15 kD zein [HofiEman et al. 

30 (1987) EMBO J. 6:3213-3221; Schcmthancr el al. (1988) EMBO J. 7:1249-1253; 
Williamson et al. (1988) Plant Physiol. 88:1002-1007], barley P-hordein [Mains et 
al. (1988) Plant Mol. BioL 10:359-366] and vt^heat glutcnin [Colot et al. (1987) 
EMBO J. 6:3559-3564]. Moreover, promoters of seed-specific genes, operably 
linked to heterologous coding sequences in chimeric gene constructs, also maintain 
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their temporal and spatial expression pattern in transgenic plants. Such examples 
include linking either the Phaseolin or Arabidopsis 2S albumin promoters to the 
Brazil nut 2S albumin coding sequence and expressing sudi combinations in 
tobacco, Arabidopsis, or Brassica napus [Altenbach et al., (1989) Plant Mol. Biol. 
5 13:513-522; Altenbach et al., (1992) Plant MoL BioL 18:235-245; De Clcicq ct 
al,, (1990) Plant Physiol. 94:970-979], bean lectin and bean p-phaseolin promoters 
to express luciferase [Riggs et al. (1989) Plant Sci. 63:47-57], and wheat glutenin 
promoters to express chloramphenicol acetyl transferase [Colot et al. (1987) 
EMBO J. 6:3559-3564]. 

10 Of particular use in the e^qpression of the nucleic acid fragment of the 

invention will be the heterologous promoters from several extensively- 
characterized soybean seed storage protein genes such as those for the Kunitz 
trypsin inhibitor [Jofuku et al. (1989) Plant Cell 1:1079-1093; Pcrez-Giau ct al. 
(1989) Plant Cell 1:1095-1 109], glycinin [Nielson et al. (1989) Plant CcU 

15 1:313-328], ^nglycinin [Harada et al. (1989) Plant Cell 1:415-425]. Promoters 
of genes for a - and ^subnnits of soybean ^onglycinin storage protein will be 
particularly useful in eiq^ressing the CS, AK Bnd AK-HDH mRNAs in the 
cotyledons at mid- to late-stages of soybean seed development [Beacfay et al. 
(198i5) EMBO J. 4:3047-3053; Barker et al. (1988) Proc. Natl. Acad. ScL USA 

20 85:458-462; Cbcn ct al. (1988) EMBO J. 7:297-302; Chen et al. (1989) Dev. 

Genet. 10:1 12-122; Naito et aL (1988) Plant Mol. BioL 1 1:109-123] in transgenic 
plants, since: a) there is very litde position effect on ifaeir esqiression in transgenic 
seeds, and b) the two promoters show different temporal regulation: the promoter 
for the a*— subunit gene is es^Tressed a few days before that for the P-subunit gene. 

25 Also of particular use in die expression of the nucleic acid fra gtue ii ts of the 

invention will be the promoters from several extensively characterized com seed 
storage protein genes such as endosperm-specific promoters from the 10 kD zein 
[Kirihara et al. (1988) Gene 71:359-370], die 27 kD zein [Prat et al. (1987) Gene 
52:51-*9; GaUardo et al. (1988) Plant ScL 54:21 1-281], and the 19 kD zein 

30 [Marks et al. (1985) J. Biol. C3iem. 260:16451-16459]. The relative 

transcriptional activities of these promotcars in com have been reported [Kodrzydc 
et al, (1989) Plant Cell 1: 105-1 14] providing a basis for choosing a promoter for 
use in chimeric gene constructs for com. For expression in com embryos, die 
strong embryo-specific promoter from the GLBl gene [Kriz (1989) Biochemical 
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Genetics 27:239-251, Wallace ei al, (1991) Plant Physiol. 95:973-975] can be 
used. 

It is envisioned that tfie introduaion of enhancers or enhancer-like 
elements into o&er promoter constnicts will also provide increased levels of 
5 primary transcrq>tion for CS and AK or AK-HDH genes to accomplish the 
invention. These would include viral enhancers such as that found in the 35S 
promoter [Oddl ct al. (1988) Plant Mol. Biol. 10:263-272], enhancers from the 
opine genes [Fromm et al. (1989) Plant C^ell 1:977-984], or enhancers from any 
other source that result in increased transcription when placed into a promoter 

10 opcrably linked to the nudeic acid fragment of the invendon. 

Of particular importance is the DNA sequence element isolated from the 
gene for the a*-subunit of P-conglycinin that can confer 40-fold seed-specific 
enhancement to a constitutive promoter [Chen ct al. (1988) EMBO J. 7:297-302; 
Chen et al. (1989) Dev. Genet. 10:112-122]. One skilled in the ait can readily 

IS isolate this element and insert it within the promoter region of any gene in order to 
obtain seed-specific enhanced expression with the promoter in transgenic plants. 
Insertion of such an element in any seed-specific gene that is esqnessed at difiiraent 
times than die p-cQnglydnin gene will result in expression in transgenic plants for a 
longer period during seed development 

20 Any 3' non-coding region capable of providing a polyadenylation signal and 

other regulatory sequences diat may be required for the proper expression of the 
CS and AK coding regions can be used to accomplish the invention. This would 
include the 3' end fiom any storage protein such as the 3* aid of the bean phaseolin 
gene, the 3' end of the soybean p-conglycinin gene, the 3' end from viral genes 

25 such as the 3* end of the 35S or the 19S cauliflower mosaic virus transcripts, the 3' 
end fiom the opine synthesis genes, die 3' ends of ribulose I^S-bisphosphate 
carboxylase or chlorophyll a/b binding protein, or 3' end sequences fircnn any 
source such diat ttie sequence employed provides the necessary regulatory 
information within its nucleic acid sequence to result in the proper expression of 

30 the promoter/coding region combination to which it is operably linked. There are 
numerous examples in the ait that teach the usefulness of different 3* non-coding 
regions [for example, sec Ingelbrecht et al. (1989) Plant CetU 1:671-680]. 

DNA sequences coding for intracellular localization sequences may be 
added to the AK or AK-HDH coding sequence if required for die proper 
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expressicjn of the proteins to accomplish the invention. Plant amino acid 
biosynthetic enzymes are known to be localized in the chloroplasts and therefore 
are synthesized with a chloroplast targeting signal. The plant-deiived CS coding 
sequence includes the native chloroplast targeting signal, but bacterial proteins 

5 such as E-JSjUAKm and AKn-HDHH have no such signal. A chloroplast transit 
sequence could, therefore, be fiised to die coding sequence. Preferred chloroplast 
transit sequences are tfiose of the small subunit of libulose 1,5-bisphosphaie 
carboxylase, e.g. ftom soybean (Berry-Lowe et al. (1982) J. Mol. Appl. Genet. 
1 :483-498] for use in dicotyledonous plants and from com [Lebtun et al. (1987) 

10 Nucleic Acids Res. 15:4360] for use in monocotyledonous plants. 

Methionine-Rich Storage Protein Cl umeric Genes 
It may be useful for certain applications to incorporate the excess free 
mediiCHiine produced via deregulation of the biosyndietic pathway into a storage 
protein. This can help to prevent metabolism of the excess ficec metiii<Hiine into 

15 such products as S-adenosyl-mediionine, \n*ich may be undesirable. The storage 
protein chosen ^ould contain hi^er levels of mediionine than average proteins. 
Ideally, Aese metiiionine-rich storage proteins should cmitain at least 15% 

metfiioaine by weiglit 

A immber of mediionine-rich plant seed storage proteins have been 

20 identified and Aeir corresponding genes have been isolated. A gene in com for a 
1 5 kD zein protein containing about 15% mediionine by wei^ [Pedersen « al. 
(1986) J. BioL Chem. 261:6279-6284], a gene for a 10 kD zein protein containing 
about 30% mediionine by weight tKirihara et al. (1988) MoL Gen. Genet. 
21:477-484; Kiiihara et al. (1988) Gene 71:359-370] have been isolated. A gene 

25 ftom Brazil nut for a seed 2S albumin containing about 24% methionine by weight 
has been isolated [Altenbach et al. (1987) Plant Mol. Biol. 8:239-250]. From rice 
a gene coding for a 10 kD seed prolamin containing about 25% methionine by 
weight has been isolated [Masumura et al. (1989) Plant Mol. Biol. 12:123-130]. A 
prefenred gene, whidi encodes the most mediionine-ridi naniral storage protein 

30 known, is an 18 kD zein protein designated high sulfur zein (HSZ) containing 
about 37% mediionine by weight that has recendy been isolated 
[PCT/US92/00958, see Example 4]. Thus, methionine-iich storage protein genes 
arc readily available to one skilled in the art. 
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The above teachings on the construction of chimeric genes for high-level 
seed-specific expression of CS, AK and AK-HDH genes are also cq^plicable to 
methionine-rich storage protein genes. Using these teachings, chimeric genes 
wherein regulatory sequences useftil for obtaining high level seed-specific 
5 expression are linked to methionine-rich storage protein coding sequmces are 
provided. In addition, there have been several reports on the e?q>ression of 
methionine-rich seed storage protein genes in transgenic plants. The high- 
mediionine 2S albumin from Brazil nut has been expressed in the seeds of 
transformed tobacco under the control of the regulatory sequences from a bean 

10 phaseolin storage protein gene. The protein was efficientiy processed from a 

17 kD precursor to the 9 kD and 3 kD subunits of the mature native protein. The 
accumulation of the methionine-rich protein in the tobacco seeds resulted in an tip 
to 30% increase in the level of methionine in the seeds [Altenbach et al. (1989) 
Plant MoL Biol. 13:513-522]. This metiiionine-rich storage protein has also been 

15 efiSciendy expressed in Canola seeds [Altenbach et al. (1992) Plant Mol. Biol. 
18:235-245.] In another case* bi^-level seed-specific expression of the IS kD 
methionine-rich zein^ under the control of the regulatory sequences frtnn a bean 
phaseolin storage protein gene, was found in transformed tobacco; the signal 
sequence of the monocot precursor was also correctly processed in these 

20 transformed plants [HofiGman et al. (1987) EMBO J. 6:3213-3221]. As another 
example, the 18 kD zein protein containing 37% methionine has been expressed in 
tobacco and soybean seeds IPCn7US92A)09581. 

Introduction of Chimeric Genes into Plants 
Various methods of introducing a DNA sequence into eukaiyotic cells (i.e., 

25 of transformarion) of higher plants are available to diose dolled in the art (see BPO 
publications 0 295 959 A2 and 0 138 341 Al). Such methods irx:lude those based 
on transformation vectors utilizing the Ti and Ri plasmkis of Aprobacteritmi spp. 
It is particulariy prefiened to use the binary type of these vectors. Ti-derived 
vectors transform a wide variety of higher plants, including monocotyledonous and 

30 dicotyledonous plants, such as soybean, cotton and lape [Pacciotri et al. (1985) 
Bio/Technology 3:241; Byrne et al. (1987) Plant Cell, Tissue and Organ C^ulture 
8:3; Sukhqjinda et al. (1987) Plant MoL BioL 8:209-216; Lorz et al. (1985) MoL 
Gen. Genet. 199:178; Potiykus (1985) MoL Gen. Genet. 199:183]. 
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Other transformation methods are available to those skilled in the art, such 
as direct uptake of foreign DNA constructs [see EPO publication 0 295 959 A2], 
techniques of electroporation [see Fromin et al. (1986) Nature (London) 319:791] 
or high- velocity ballistic bombardment with metal particles coated with the nucleic 
5 add constructs [see Kline et al, (1987) Nature (London) 327:70, and see U.S. Pat. 
No. 4,945,050]. Once transformed, the cells can be regenerated by those skilled in 
the art. 

Of particular relevance are the recently described mediods to transform 
foreign genes into commercially important crops, such as rapeseed [see De Block 

10 et al. (1989) Plant Physiol. 91:694-701], sunflower [Everett et al. (1987) 

Bio/Technology 5:1201], soybean [McCabc et al. (1988) BioA'echnology 6:923; 
Hindiee et al. (1988) Bio/Tcdmology 6:915; C3iee et al. (1989) Plant Physiol. 
91:1212-1218; CSuistou et al. (1989) Proc. Natl, Acad. Sci USA 86:7500-7504; 
EPO Publication 0 301 749 A2], and com [Gordon-Kamm et al. (1990) Plant Cell 

15 2:603-618; Fromm et al. (1990) Biotechnology 8:833-839]. 

There are a number of mediods that can be used to obtain plants containing 
multq>le chimeric genes of this invention. Chimeric goies for seed-specifid 
esqpression of CS and AK or AD-HDH can be linked on a single nucleic acid 
fragm«t ^liiich can be used for transformation. Alternatively, a plant transformed 

20 with a CS chimeric gene can be crossed with a plant transformed with an AK or 
AK-HDH chimeric gene, and hybrid plants carrying both chimeric genes can be 
selected. In another method the CS and AK or AK-HDH chimeric genes, carried 
on separate DNA fragments, are co-transformed into the target plant and 
transgenic plants carrying bodi chimeric genes are selected. In yet another method 

25 a plant transformed with one of die chimeric genes is re-transformed with the other 
chimeric gene. 

Similar methods can be used to obtain plants that contain a chimeric gene 
with a regulatory sequence citable of producing hig^i level seed-specific 
esq^ression for a methionine-rich storage protein gene along with a CS chimeric 
30 gene, with our without an AK or AK-HDH cliimeric gene. Plants can be 
transformed with a nucleic acid fragment wherein a methionine-rich storage 
protein chimeric gene is linked to a CS chimeric gene, with or without an AK or 
AK-HDH chimeric gene. Altemarivcly, the CS, AK or AK-HDH, and methionine- 
rich storage protein chimeric genes can be co-transformed into the target plant and 
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transgenic plant, or the methionine-rich storage protein gene can be introduced 
into previously transformed plants that contain a CS diimeric gene, with or 
without, an AK or AK-HDH chimeric gene. As another alternative, the 
methionine-iich storage protein gene can be introduced into a plant and the 
5 transformants obtained can be crossed with plants that contain a CS chimeric gene, 
with or without, an AK or AK-HDH chimeric gene. 

Ejqjression of Chimeric Genes 
in Transformed Plants 
To analyze for expression of the chimeric CS, AK, AK-HDH and 

10 methionine-rich storage protein gene in seeds and for the consequences of 

expression on the amino acid content in the seeds, a seed meal can be prepared by 
any suitable method. The seed meal can be partially or completely defatted, via 
hexane extraction for example, if desired. Protein extracts can be prepared from 
the meal and analyzed for CS, AK or HDH enzyme activities. Alternatively the 

15 presence of any of the proteins can be tested for immunologically by methods well- 
known to those sidlled in the art. To measure free amino acid composition of the 
seeds, free amino acids can be detracted from the meal and analyzed by metfiods 
known to those skiUed in the art [Bieleski et al. (1966) Anal. Biochem. 
17:278-293]. Amino acid composition can then be determir^ using any 

20 commercially available amino acid analyzer. To measure total amino acid 

composition of the seeds, meal ccmtaining both protein-bound and free amino acids 
can be acid hydrolyzed to release the protein-bound amino acids and the 
composition can tiicn be determined using any commercially available amino acid 
analyzer. Seeds expressing the CS, AK, AK-HDH and/or methionine-rich storage 

25 proteins and with higher methionine content than the wild type seeds can thus be 
identified and propagated. 

EXAMPLES 

The present invention is further defined in the following Examples, in 
which all parts and percentages arc by weight and degrees are Celsius, unless 
30 otfierwise stated. It shotild be understood that these Examples, while indicating 
preferred embodiments of the invention, are given by way of illustration only. 
From the above discussion and these Examples, one skilled in the art can ascertain 
the essential characteristics of this invention, and without departing from the spirit 
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and scope thereof, can make various changes and modifications of the invention to 
ad^t it to various usages and conditions. 

EXAMPLE 1 
Isolation of a Plant CS gene 
5 In order to clone the com CS gene, RNA was isolated from developing 

seeds of com line H99 1 9 days after pollination. This RNA was sent to Qontedi 
Laboratories, Inc., (Palo Alto, CA) for the custom synttiesis of a cDNA library in 
the vector Lambda Zap 11. The conversion of Ae Lambda Zap II library into a 
phagemid library, then into a plasmid library was accomplished following the 
10 protocol provided by Clontech. Once converted into a plasmid library the 

ampicillin-resistant clones obtained carry the cDNA insert in the vector pBluescrq>t 
SK(-). £q>ression of the cDNA is under control of the lacZ promoter on the 
vector. 

Two phagemid libraries were generated using the mixtures of the Lambda 
15 Zsq> n phage and the filamentous helper phage of 100 fiL to 1 \iL. Two additional 
libraries were generated using mixtures of 100 pL Lambda Z2q> II to 10 pL helper 
phage and 20 pL Lambda Zap II to 10 pL hdper phage. The titers of the 
phagemid preparations were similar regardless of die mixture used and were about 
2 X 103 ampiciUin-resistant-transfectants per pL with E. c^ strain XLl-Blue as 
20 the host. 

To idmtify clones that carried the CS gene, E. coli strain BOBIOS was 
coostracted by introducing the F plasmid from E. coli strain XLl-blue into strain 
UB 1005 [Clark (1984) FEMS MicrobioL Lett. 21:189] by conjugation. The 
genotype of BOBIOS is: F:: Tnl0 ^oA+B+ ladq AaacZ)M15 AialA 37 metB L The 

25 strain requires methionine for growth due to a mutation in the mctB gene that 

encodes CS. Functional expression of the plant CS gene should complement the 
mutation and allow the strain to grow in the absence of methionine. 

To select for clones from the com cDNA library that carried the CS gene, 
100 pL of the phagemid library was mixed witfi 300 pL of an overnight culture of 

30 BOBIOS grown in L broth and incubated at 37"^ for 15 min. The cells were 

collected by centrifugation, resuspended in 400 pL of M9 + vitamin Bl broth and 
plated on M9 media containirig vitamin B 1 , glucose as a carbon and energy source, 
20 pgAnl threonine (to prevent Ae possibility of threonine starvation due to 
overexpression of CS), 100 \lg/mL ampicillin, 20 pg/mL tetracycline, and 
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0.16 mM IPTG (isopropylthio-p-galactoside). Fifteen plates were prepared and 
incubated at 37°. The amount of phagemid added was expected to yield about 2 x 
1 0^ ampicillin-Fcsistant transfectants per plate. 

Approximately 30 colonies (an average of 2 per plate or 1 per 10^ 
S transfectants) able to grow in the absence of mettiionine were obtained. No 

colonies were observed if the phagemids carrying the com cDNA lihraiy were not 
added. Twelve clones were picked and colony purified by streaking on the same 
medium described above. Plasmid DNA was isolated from tfie 12 clones and 
retransformed into BOB 105, All of the 12 DNAs yielded metfaionine-independent 
10 transformants demonstrating that a plasmid-bome gene was responsible for tfie 
{Genotype. Plasmid DNA was prepared from 7 of these clones and digested with 
restriction enzymes EcoR I and Xho I. Agarose gel electrophoresis of the digests 
revealed that S of the clones had EcoR I and Xho I sites at the ends of the inserts^ 
as e^>ected from the method used to create the cDNA library. Three of jBve 
1 5 plasmids analyzed had a common internal Taq I fragment, indicating that these 
plasmids w«e related. One of ttnee related DNA inserts, derived from plasmid 
pFS1088, as well as another muelated DNA insert* from plasmid pFS1086r was 
completely sequoiced. 

The DNA insert in plasmid pFS1086 is 1048 bp in length and contains a 
20 long open reading frame and a poly A tail, indicating that it leptesents a com 
cDNA. The deduced amino acid sequence of the open readirig frame shows no 
similarity to the published sequmce of E. coli CS [Duchange et al. (1983) J. Biol. 
Chem. 258: 14868-14871]. None of die proteins in the GenBank database showed 
significant amino acid sequence similarity to the pFS1086 reading frame. Thus, 
25 the function of the protein encoded on plasmid pFS 1086 and the reason for its 
ability to complement the metB mutation in BOB 105 is unknown. 

The sequence of the DNA insert in plasmid pFS 1088 is shown in SEQ ID 
NO:l. It is 1639 bp in length and contains a long open reading frame and a poly A 
tail, indicating that it too represents a com cDNA. The deduced amino acid 
30 sequence of the open reading frame shows 59 percent similarity aixl 34 percent 
identity to the published sequence of E. coli CS (see Figure 1), indicatirig that it 
represents a com homolog to the E. coli metB gene. Comparison of the amino 
acid sequences reveals that amino acid 89 of com CS aligns with amino acid 1 of 
the E. coli protein. Since most amino acid biosynthctic enzymes are localized in 
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chloroplasts, it is likely that the first 88 amino acids of com CS is a chloroplast 
targeting signal, which is absent in the bacterial protein. The amino acid sequence 
in this region has many of the features characteristic of chloroplast targeting 
signals, namely a deficiency in negatively charged amino acids and a net positive 
5 charge, a large percentage of the hydroxylated amino acids serine and threonine 
(22%), and a large percentage of the small hydrophobic amino acids alanine and 
valine (22%). 

The open reading frame inplasmid pFS1088 continues to the 5' end of the 
insert DNA, and does not include an ATG initiator codon, indicating that the 

10 doncd cDNA is incomplete. Since chloroplast targeting signals range from about 
30 to 100 amino acids in length, and 88 amino acids are presem upstream of the 
hcmiology between Ac E. £sM and com CS, it is likely tiiat most of die coding 
sequence, including a functional chloroplast targeting signal, is contained in the 
cloned insert TIj e open reading fiame of pFS1088 is in frame with the initiator 

1 5 codon of the T^r^ gene carried on ttie cloning vector. Thus, complementation of 
die metB mutation in BOB105 results from esqnession of a frision protein 
including 37 amiiK) acids from p-galactosidasc and the vector polylinker attached 
to the truncated com CS protein. 

In order to clone tfie entire 5' end of the com CS gene the cDNA clone was 

20 used as a DNA hybridization probe to screai a gcncwnic com library. A genomic 
library of com in bacteriophage lambda was purchased from Stratagene (La Jolla, 
California). Data dieets from the siqjplier indicated that the com DNA was from 
etiolated Missouri 17 com seedlings. The vector was Lambda FIX~ II carrying 
Xho I fragments 9-23 kb in size. A titer of 1.0 x 10*^ plaque fomiing units 

25 (pfu)/mL in the amplified stock was indicated by die supplier when purchased. 
Prior to screening, die library was re-titered and contained 2.0 x 10* pfritAnL. 

The protocol for screening the library by DNA hybridization was provided 
by Clmetech (Palo Alto , California). About 30,000 pfii were plated per 150-mm 
plate on a total of 12 NZCYM agarose plates giving 360,000 plaques. Plating was 

30 done using E. c^ LE392 grovm in LB + 0:2% maltose +10 mM MgS04 as the 
host and NZCYM-0.7% agarose as the plating medium. The plaques were grown 
overnight at 3TC and placed at 4*^ for one hour prior to lifting onto filters. The 
plaques were absorbed onto nylon membranes (Amersham Hybond-N, 0.45 mM 
pore size), two lifts from each plate, denatured in 0.5 M NaOH, 1.5 M NaCl, 
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neutralized in 1.5 M NaCl, 1.0 M Tris-Q pH 8.0, and rinsed in 2XSSC [Sambrook 
et al. (1989) Molecular Qoning: A Laboratory Manual, Cold Spring Harbor 
Laboratory Press; Boehringer Mannheim Biochemicals, The Genius"* System 
Users Guide for Filter Hybridization, Version 2.0]. The filters were blotted on 
5 Whatman 3MM paper and heated in a vacuum oven at BO'^C for two hours. 

A digoxigrain-1 1-dUTP labeled com cDNA CS probe was prepared by 
random primed DNA labeling using Genius 2 DNA Labeling Kit (Boehringer 
Mannheim Biochemicals, The Genius^ System User's Guide for Filter 
Hybridization, Version 2.0). The DNA fragment used for labeling was an Nco I to 

10 BspH I (1390 bp) from plasmid pFS1088 isolated by low melting point (LMP) 
agarose gel electrophoresis and NACS purification (Betficsda Research 
Laboratories). The 1390 bp band was excised from 0.7% LMP agarose, melted, 
and diluted into 0.5 M NaCl and loaded onto a NACS column, which was then 
washed with 0.5 M NaQ, 10 mM Tris-Cl, pH 7.2, 1 mM EDTA and the fragment 

15 duted with 2 M NaQ, 10 mM Tris-Cl, pH 7.2, 1 mM EDTA. An estimate of the 
yield of DIG-labeled DNA followed the Boefadnger Maimheim Biochemicals 
procedure for chemiluminesoem detection with Lumi-Phos 530 replacing the 2% 
Blocking reagent for nucleic acid hybridization with 5% Blotting Grade Blocker 
(Bio-Rad Laboratories, Macules, California). 

20 Tlie twenty-four I50-mm nylon filters carrying Ae X phage plaques were 

prcwashed m O.IX SSC, 0.5% SDS at 65^ for one hour. Overnight 
pr^ybzidization at 65^ was carried out in 5X SSC [see Sambrook et al. (1989) 
Molecular Qoning: A Laboratory Manual, Cold Spring Harbor Laboratory Press], 
0.5% Blocking reagent for nucleic acid hybridization (Boehringer Mannheim 

25 Biochemicals), 1.0% N-lauroylsarcosinc, and 0^% SDS. The filters were 

hybridized overnight in fresh prehybridization solution with denatured DIG-labded 
com CS cDNA probe at 10 ng DIG4abeled DNA/ml of hybridization solution at 
65 They were rinsed the following day under stringoit conditions: two times 
for 5 minutes at room temp in 2X SSC - 0.01% SDS and two times 30 minutes at 

30 65°C in O.IX SSC - 0.1% SDS. Filters were then processed following the 

Boehring^ Mannheim Biochemicals procedure for chemiluminescent detection 
with Lumi-Phos 530 with modifications as described above. From the 
autoradiograms of the duplicate fiUlters, 1 1 hybridizing plaques were identified. 
These plaques were picked from the original petri plate and plated out at a dilution 
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to yield about 1000 plaques per 80-tnm plate. These plaques were absorbed to 
nylon filters and re-probed using the same procedure. After autoradiography, two 
of the original plaques, number 6-1 and nxunber 10-1 , showed hybridizing plaques. 
These plaques were tested with the probe a third time; and well isolated plaques 
5 were picked firom each original. Following a fourth probing all the plaques 
hybridized, indicating that pure clones had been isolated 

DNA was prepared from these two phage clones, X 6-1 and X, 10-1 , using 
the protocol for plate lysate method [see Sambrook et al. (1989) Molecular 
Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press]. 

10 Restriction cndonuclease digests and agarose gel electrophoresis showed the two 
clones to be identical. The DNA fragments from the agarose gel were **Soutfiem- 
blotted" [see Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 
Cold Spring Harbor Laboratory Press] onto nylon filters and probed with 
DIG-labeled com CS cDNA as described above. A single 7.5 kb Sal 1 fragment 

15 and two Xba I fragments of 3.6 ld> and 3.2 kb hybridized to the probe. The 3.2 kb 
Xba I fragment hybridized weakly to the probe whereas the 3.6 id> Xba 1 and tfie 

7.5 kb Sal I fragmrats hybridized stnn^y. 

The 7.5 kb Sal I fragment and the 3.6 kb and 3.2 kb Xba I fragments were 
isolated fit>m digests of die A. DNA run on an 0.7% low melting point (LMP) 

20 agarose gel. The 7.5 kb, 3.6 kb and 3.2 ld> bands were excised, melted, and 

diluted into 0.5 M NaCl and loaded onto NACS columns, whidb were then washed 
with 0.5 M NaCl, 10 mM Tris-O, pH 7.2, 1 mM EE>TA and the fragment elated 
with 2 M NaCl, 10 mM Tiis-Cl, pH 7.2, 1 mM EDTA. The 7.5 kb fragment was 
Ugated to the phagemid pGEM**-9Zf(-) (Promega, Madison, WI) which had been 

25 cleaved with Sal I and treated with calf intestinal alkaline phosphatase [see 

Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory Press] to prevent ligation of the phagemid to itself. Subclones 
with this fragment in both orimtations with respect to the pGEM®-9Zf(-) DNA 
were obtained following transformation of E. coli . The 3.6 kb and 3.2 kb Xba I 

30 fragments were similarly cloned into the Xba I site of pGEM®-9Zf(-) that had 
been treated with calf intestimal alkaline phosphatase. Two subclones from each 
Xba I fragment with the fragments in both orientations with respect to 
pGEM®-9Zf(-) DNA were obtained following transformation of E. coli . The two 

3.6 kb Xba I subclones were designated pFSl 179 and pFSl 180. 
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Restriction enzyme analysis of the subclones suggested that the 3,6 kb 
Xba I fragment in pFS1179 and pFSl 180 included the 5' region of the com CS 
gene. Preliminary sequence analysis of these clones using primers internal to the 5' 
end of the cDNA confirmed that the clones contained the 5* end of the genomic CS 
5 gene. The combined sequcaice and restriction enzyme analysis suggested that the 
3.6 kb Xba I fragment contained the entire 5' region encoding the chloroplast 
targeting signal as weD as an additional approximately 800 bp of sequence in tfie 
promoter region of the gene. 

DNA from pFSUSO was sent to LARK Sequencing Technologies Inc. 

10 (Houston, TX) for complete DNA sequencing analysis. The 3.6 kb Xba I 
fragment was blunt-«ided, cloned into the EcoR V site of pBluescript n SK+ 
(Stratagene, LaJoUa, CA) and transformed into fidi. Nested deletions were 
generated from both the T7 and T3 ends using Exo III and SI nuclease. Plasmid 
DNA was prepared using a modified alkaline lysis procedure. Deletion clones 

15 were size-selected for DNA sequencing by electrophoresis on agarose gels. DNA 
sequencing was pecfomed using standard dideoxynudeotide termination reactions 
cmtaining 7-deaza dGTP. 7-deaza dTTP was used, if necessary, to lesrfvc severe 
GC band compressions. The label was p^SJdATP. Sequencing reactions were 
anal3rsed on 6% polyacrylamide wedge gels containing 8 M urea. Hie entire 

20 3639 bp Xba I fragmoit was sequenced (see SEQ ID NO:26). 

Complete sequence analysis of Ac 3639 bp Xba I fragment revealed it 
includes 806 bp of sequence upstream from the protein coding region and 2833 bp 
of DNA encoding two-thirds of the com CS protein. The 2833 bp includes seven 
exons and seven introns with the 3* Xba I site located in ttie seventh intron. 

25 Table 1 describes the location and Imgth of exons and introns in the sequence as 
well tfie number of amino acids encoded by the exons. The first exon includes the 
entiie chloroplast targeting signal and 12 amino acids into the region that shows 
amino acid sequence aligrunent widi Ac E. cdi protein (Figure 1 ). The last codon 
in Exon 7 encodes amino acid 333 of com CS as shown in SEQ ID NO;I. 

30 
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TABLE 1 



REGION 


FROM bp 


TO bp 


LENGTH in bp 


#AMINO 
AODS 
ENCODED 


Promoter 


1 


806 


806 


na 


Exonl 


807 


1194 


387 


129 


Intronl 


1195 


1301 


106 


na 


Exon2 


1302 


1405 


103 


35 


Intron2 


1406 


1489 


83 


na 


Exon3 


1490 


1563 


73 


24 


IntronB 


1564 


1646 


82 


na 


Exon4 


1647 


1815 


168 


57 


IntrDn4 


1816 


2507 


691 


na 


Exon5 


2508 


2567 


59 


20 


IntronS 


2568 


2660 


92 


na 


Exon6 


2661 


2864 


203 


68 


Iiitron6 


2865 


2947 


82 


na 


Exon? 


2948 


3034 


86 


29 


Intnm? 


3035 


3639 


>604 


na 



Ccmiparison of the com CS cDNA sequmce to the gencnnic CS DNA 
sequence mdicated that the cDNA of clone pFS1088 did not contain the entire 
cUotoplast targeting signal as anticq>ated. The cDNA was not tnincated on the 5* 
5 Old, but contained a 170 bp deletion in the chloroplast transit sequence (Figure 2). 
Southern blot anal3^is of gmomic DNA from com lines H99 and Missouri 17 
confimxed that the sequence difference was due to a deletion in the cDNA. This 
deletion placed the correct CS ATG initiator codon, which is located at 
nucleotides 85-87 of SEQ ID NO: 1 , out of frame with the initiator codon of the 

10 lacZ gene carried on the cloning vector. The cDNA sequence returned to the 

proper CS coding frame at amino acid 62 near the 3' end of the deleted sequmce. 
Complementation of the metB mutation in BOB 105 resulted from e^qnression of a 
fusion protein including 37 amino acids from p-galactosidase and the vector 
polyUnker plus 61 amino acids that are encoded by the com CS sequence, but are 

1 5 from the incorrect reading frame, for a total of 98 amino acids attached to die 
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amino tcnninus of the com CS protein. Thus, the com CS protein can tolerate 
extra amino acids fused to its amino tenminus without loss of function. 

Comparison of the com CS cDNA sequence 3* to the deletion region with 
the genomic sequence (with introns removed) shows 96 percent identity. 
5 Comparison of the two DNA sequences 5' to the deletion region shows 88% 
identity. The deduced amino acid sequence of the open reading frame of the 
cDNA y to die deleted sequence shows 99.3% similarity and 98.9% identity when 
compared to the deduced amino acid sequence from the exons of the genomic CS 
sequence. When the correct reading frame is translated from the cDNA S' to the 
10 deleted sequence the deduced amino acid sequence shows 100% identity to ttie 
deduced amino acid sequence translated from the exons of the genomic CS 
sequence in this region. The complete amino acid sequence of ttie com CS protein 
derived from combining die amino terminal sequence deduced from the com 
genomic DNA fragment of SEQ ID NO:26 and the carboxy terminal sequence 
15 from the com cDNA fragment of SEQ ED NO: 1 is shown in SEQ ID NO:27. 

EXAMPLE2 
Modification of die Com CS Gene and 
piph lev^l fiTpy nssion in E, coli 
As iiuiicated in Example I, die open reading frame in plasmid pFS1088 for 
20 the com CS gene does not include an ATG initiator codon. Oligonucleotide 

adaptors OTG145 and OTG146 were designed to add an initiator codon in frame 
with the CS coding sequence. 

OTG145 = SEQ ID NO:2: 
25 AATTCATGAGTGCA 

OTG146 = SEQ ID NO:3: 
AATTTGCACr CATG 

30 When annealed the oligonucleotides possess EcoR I sticky ends. Upon inseition 
into pFSlOSS in the desired orientation, an EcoR I site is present at the 5* end of 
the ad^tor, the ATG initiator codon is within a BspH I restriction endonuclease 
site, and the EcoR I site at the 3' end of the adaptor is destroyed* The 
oligonucleotides were ligated into EcoR I digested pFS1088, and insertion of the 

35 correct sequence in the desired orientation was verified by DNA sequencing. 
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To achieve high level expression of the com CS gene in E. coU the 
bacterial expression vector pBT430 was used. This expression vector is a 
derivative of pET-3a [Rosenberg et al. (1987) Gene 56:125-135] which employs 
the bacteriophage T7 RNA polymerase/T? promoter system. Plasmid pBT430 
5 was constructed by first destroying the EcoR I and Hind HI sites in pET-3a at their 
original positions. An oligonucleotide adaptor containing EcoR I and Hind III 
sites was inserted at tfie BamH I site of pET-3a. This created pET-SaM with 
additional unique doning sites for insertion of genes into the expression vector. 
Then, the Nde I site at the position of translation initiation was converted to an 
10 Nco I site using oligonucleotide-direaed mutagenesis. The DNA sequmce of 
pET-3aM in tiiis region, 5' -CATATG G, was converted to 5'-C CCATGG in 
pBT430. 

The com CS gene was cut out of the modified pFSl088 plasmid described 
above as an 1482 bp BspH I fi-agment and inserted into the egression vector 

1 S pBT430 digested with Nco I. Clones with the CS gene in the proper orientation 
were identified by restriction enzyme mc^Tping. 

For hig^ level e^ipiession each of the plasmids was transformed into E. coli 
strain BL21(DE3) or BL21(DE3)lysS [Studicr ct al. (1986) J. Mol. Biol. 
189:1 13-130]. Cultures were grown in LB medium contaxiiing ampicillin 

20 (100 mg/L) at 37**C. At an optical density at 6O0 ran of approximately 1, IPTO 
(isopiop^ithio-^galactoside, the inducer) was added to a final concentration of 
0.4 mM and incubation was continued overnight. The cells were collected by 
centrifugation and resuspended in l/20th the original culture volume in 50 mM 
NaCU 50 mM Tris-Cl. pH 7.5; 1 mM EDTA, and fi^ozcn at -20°C. Frozen aliquots 

25 of 1 mL were Aawed at 37®C and sonicated, in an ice-water bath, to lysc the cells. 
The lysate was centrifiiged at 4°C for 5 min at 12,000 rpm. The supernatant was 
removed and the pellet was resuspended in 1 mL of the above buffer. 

The sv^matant and pellet fi-actions of uninduced and IFTG-induced 
cultures were analyzed by SDS polyacrylamide gel electrophoresis. The best of 

30 the conditions tested was the induced cultiuc of the BL21(DE3)lysS host. The 
major protein visible by Coomassie blue staining in the pellet fraction of this 
induced culture had a molecular weight of about 54 kd, the expected size for com 
CS. 
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EXAMPLES 
Isolation of the E. coli IvsC Gene and mutations 
in IvsC resulting in Ivsine-insensitive ATCTTT 
The E. sSiii ly^ gene has been cloned, restriction endonuclease mapped 
5 and sequenced previously [Cassan et al. (1986) J. Biol. Chan. 261:1052-1057]. 
For the present invention the lysC gene was obtained on a bacteriophage lambda 
clone from an ordered library of 3400 ovedapping segments of cloned E. coli 
DNA constructed by Kohara, Akiyama and Isono [Kohara et al. (1987) Cell 
50:595-508]. This libraiy provides a physical map of the whole E. coli 
10 chromosome and ties the physical map to the genetic m^. From the knowledge of 
the m^ position of JysC at 90 min. on the E. coli genetic map ITheze et al. (1974) 
J. Bacteriol. 1 17:133-143], the restriction endonuclease map of the cloned gene 
[Cassan et al. (1986) J. Biol. Chem. 261:1052-1057], and Ac restriction 
endonuclease map of the cloned DNA fragments in the E. coli library [Kohara et 
15 al. (1987) Cell 50:595-508], it was possible to choose lambda phages 4E5 and 7A4 
[Kohara et al. (1987) Cdl 50:595-508] as likely candidates for carrying die lysC 
gene. The phages were grown in liquid culture from single plaques as described 
[see C}unent Protocols in Molecular Biology (1987) Ausubel et al. eds. John Wiley 
& Sons New York] using LE392 as host [see Sambrook et al. (1989) Molecular 
20 Qoning: a Laboratory Manual, Cold Spring Harbor Laboratory Press]. Phage 
DNA was prepared by phenol extraction as described [see Currmt Protocols in 
Molecular Biology (1987) Ausubel et al. eds. John Wiley & Sons New Yoik]. 

From die sequence of die gene several restriction endonuclease fragments 
diagnostic for the lysC gene were predicted, including an 1860 bp EcoR I-Nhe I 
25 fragment, a 2140 bp EcoR I-Xmn I fragment and a 1600 bp EcoR I-BamH I 
fragment. Each of these fragments was detected in both of the phage DNAs 
confirming that these carried die lysC gene. The EcoR I-Nhe I bagment was 
isolated and subcloned in plasmid pBR322 digested with die same enzymes, 
yielding an ampicillin-resistant, tetracycline-sensidve E. coli transfonnant. The 
30 plasmid was designated pBT436. 

To establish dial the cloned lysC gene was functional, pBT436 was 
transforaied into E. coh strain Gif 106M1 (E. coli Genetic Stock Center strain 
CGSC-5074) which has mutations in each of the three E. AK genes [Theze et 
al. (1974) J. Bacteriol. 117:133-143]. This strain lacks all AK activity and 
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diercforc requires diamincpimelate (a precursor to lysine which is also essential for 
cell wall biosynthesis), threonine and methionine. In the transformed strain all 
these nutritional requiremoits were relieved demonstrating that the cloned lysC 
gene encoded functional AKIH. 
5 Addition of lysine (ox diaminopimelate ^\Uch is readily converted to lysine 

in vivo) at a concentration of ^^proximately 0.2 mM to the growth medium 
inhibits the growth of Gifl06Ml transformed witfi pBT436. M9 media [see 
Sambrook et al. (1989) Molecular Cloning: a Laboratoiy Manual^ Cold Spring 
Haibor Laboratory Press] supplemented witii the arginine and isoleucine, required 

10 for Gifl06Ml growth, and ampicillin, to maintain selection for the pBT436 
plasmid, was used. This inhibition is reversed by addition of threonine plus 
mediionine to the growth media. These results indicated that AKHI could be 
inhibited by exogenously added lysine leading to starvation for the other amino 
acids derived from aspartate. This property of pBT436-transfonned Gif 106M1 

15 was used to select for mutations in lysC diat encoded lysine-insensitive AKHI. 

Single colonies of Gifl06Ml transfonned with pBT436 were picked and 
resuspended in 200 ^L of a mixture of 100 \3lL 1% lysine plus 100 of M9 
media. The entire cell suspension containing 10^*10^ cells was spread on a petri 
dish containing M9 media supplemented with ttie arginine, isolencine, and 

20 ampiciUin. Sixteen petri dishes were thus prqpared. From 1 to 20 colonies 

ajjpeaxed on 11 of the 16 petri dishes. One or two (if available) colonies were 
picked and retested for lysine resistance and from this nine lysine-resistant clones 
were obtained. Plasmid DNA was prepared firom eight of these and le- 
transformed into Gifl06Ml to determine whether die lysine resistance detenninant 

25 was pIasmid*bome. Six of the ei^t plasmid DNAs yielded lysine-resistant 
colonies. Three of these six carried lysC genes encoding AKHI that was 
uninhibited by 15 mM lysine, whereas wild type AKUI is 50% inhibited by 
0.3-0.4 mM lysine and >90% inhibited by 1 mM lysme (see Example 2 for details). 
To determine die molecular basis for lysine-resistance the sequences of die 

30 wild type lysC gene and three mutant genes were determined. The sequence of the 
wild type ly^ gene cloned in pBT436 (SEQ ID NO:4) differed from the published 
lyi^ sequence in the coding region at 5 positions. Four of these nucleotide 
differences were at the diird position in a codon and would not result in a change 
in the amino acid sequence of the AKHI protein. One of the differences would 
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result in a cysteine to glycine substitution at amino acid 58 of AKm. These 
differences are probably due to the different strains from which the lysC genes 
were cloned. 

The sequences of the three mutant IvsC genes that encoded lysine- 
5 insensitive AK each differed from the wild type sequence by a single nucleotide, 
resulting in a single amino acid substitution in the protein. Mutant M2 had an A 
substimted for a G at nucleotide 954 of SEQ ID NO:4 resulting in an isolcucine 
for methionine substitution at amino acid 318 and mutants M3 and M4 had 
identical T for C substitutions at nucleotide 1055 of SEQ ID NO:4 resulting in an 
10 isoleucine for threonine substitution at amino acid 352. Thus, either of these single 
amino acid substitutions is sulfficient to render the AKm enzyme insensitive to 
lysine inhibition. 

An Nco I (CCATGG) site was inserted at the translation initiation codon of 
the IvsC gene using the following oligonucleotides: 

15 

SEQ ID NO:5: 

GATCC ATGGC TG AAATTGTT GTCTCCAAAT TTGGCG 
SEQ ID NO:6: 

20 GTACCGCC AA ATTTGGAG AC AACAATTTCA GCC ATG 

When annealled these oligonucleotides have BamH I and Asp 718 "sticky" ends. 
The plasmid pBT436 was digested wifli BamH I, which cuts upstream of the ly£ 
coding sequence and Asp 718 which cuts 31 nucleotides downstream of the 

25 initiation codon. The annealled oligonucleotides were ligated to the plasmid 

vector and E. coli transformants were obtained. Plasmid DNA was prepared and 
screened for insertion of the oligonucleotides based on the prcsraice of an Nco I 
site, A plasmid containing the site was sequenced to assure that the insertion was 
correct, and was designated pBT457. In addition to creating an Nco I site at the 

30 initiation codon of lysC . this oligonucleotide insertion changed the second codon 
from TCr, coding for serine, to GCT, coding for alanine. Hiis amino acid 
substitution has no parent effect on the AKHI enzyme aaivity. 

The lysC gene was cut out of plasmid pBT457 as a 1560 bp Nco I-EcoR I 
fragment and inserted into the e5q>ression vector pBT430 digested with the same 

35 enzymes, yielding plasmid pBT461 . For expression of the mutant lysC -M4 gene 
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pBT461 was digested with Kpn I-EcoR I, which removes the wild type IvsC gene 
from about 30 nucleotides downstream from Ac translation start codon, and 
inserting the analogous Kpn I-EcoR I fragments from the mutant genes yielding 
plasmidpBT492. 
5 EXAMPLE 4 

Molecula r Onnin p of Com Genes Encoding 
Methionine-Rich Seed Storage Proteins 
A high methionine 10 kD zein gene [Kirihara et al. (1988) Mol. Gen. Genet. 
211 :477-484] was isolated from com genomic DNA using PGR. Two 
10 oligonucleotides 30 bases long flanking this gene were synthesized using an 

Applied Biosystems DNA syntfiesizcr. Oligomer SM56 (SEQ ID NO:7) codes for 
the positive strand spanning the first ten amino acids: 

SM56 5 -ATGGCAGCCA AGATGCTTGC ATTGlTCGCT-3' (SEQ ID NO:7) 

15 

Oligomer CFC77 (SEQ ID NO:8) codes for the negative strand spanning Ac last 
ten amino acids: 

CFC77 5*-GAATGCAGCACCAACAAAGGGTTGCrGTAA-3'(SEQID 
20 NO:8) 

These were employed to generate by polymerase chain reaction (PGR) the 10 kD 
coding regicm using maize genomic DNA from strain B8S as die template. PGR 
was peifbxmed using a Peddn-Elmer G^s kit according to the instructions of die 

25 vendor on a fliermocyclcr manufactured by tfic same company. The reaction 

product when run on a 1% agarose gel and stained with ethidium bromide showed 
a strong DNA band of the size expected for the 10 kD zein gene, 4S0 bp, widi a 
faint band at about 650 bp. The 450 bp band was electro-eluted onto DEAE 
cellulose membrane (Schleicher & Schuell) and subsequently eluted from the 

30 membrane at 65*'C with 1 M NaGl, 0.1 mM EDTA, 20 mM Tris-Cl, pH 8.0. Tlie 
DNA was ethanol precipitated and rinsed with 70% ethanol and dried. The dried 
pelle^ was resuspended in 10 pi. water and an aliquot (usually 1 ^L) was used for 
another set ofPCR reactions, to generate by asymmetiic priming single-stranded 
linear DNAs. For this, die primers SM56 and GFC77 were presmt in a 1 :20 molar 

35 ratio and 20: 1 molar ratio. The products, both positive and negative strands of the 
10 kD zein gene, were phenol extracted, ethanol precipitated, and passed through 
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NACS (Bethesda Research Laboratories) columns to remove the excess 
oligomers. The eluates were ethanol precipitated twice, rinsed with 70% ethanol, 
and dried. DNA sequencing was done using the appropriate complementary 
primers and a sequenase kit from United States Biochemicals Company according 
5 to the vendors instructions. The sequence deviated from the publidied coding 
sequence (Kirihara et al., Gene, 71:359-370 (1988)) in one base pair at nucleotide 
position 1 504 of the published sequence. An A was changed to a G which resulted 
in the change of amino acid 123 (with the initiator metfiionine as amino acid 1 ) 
from Gin to Arg. It is not known if the detected mutation was generated during 
10 the PCR reaction or if this is anodier allele of the maize 10 kD zein gene. A 

radioactive probe was made by nick-translation of the PCR-gcnerated 10 kD rein 
gene using ^^P-dCTP and a nick-translation kit purdiased fr<Hn Bethesda Research 
Laboratories. 

A genomic library of com in bacteriophage lambda was purchased from 

15 Clontech (Palo Alto, CA). Data sheets from the supplier indicated that the com 
DNA was from scvcn-day-old seedlings grown inifae dark. The vector was 
X-EMBL-3 carrying BamHI fragments 15 kb in average size. A titer of 1 to 9 x 
10^^ plaque forming imits (pfu)AnL was indicated by the supplier. Upon its arrival 
the library was titmd and contained 2.5 x 10^ pfu/mL. 

20 Hie protocol for screening the library by DNA hybridization was provided 

by Ac vrodor. About 30,000 pfu were plated per 150HDam plate on a total of 
15 Luria Broth (LB) agar plates giving 450,000 plaques. Plating was done using 
E. cdi LE392 grown in LB + 0.2% maltose as die host and LB-0.7% agarose as 
the plating medium. Hie plaques were absorbed onto nitrocellulose filters 

25 (Milliporc HATF, 0.45 mM pore size), denatured in 0.5M NaOH, neutralized in 
1,5 M NaCl, 0.5 M Tris-Q pH 7.5, and rinsed in 3XSSC [Sambrook et al. (1989) 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press]. 
The filters were blotted on Whatman 3MM p^r and heated in a vacuimi oven at 
80**C for two hours to allow firm anchorage of phage DNA in the membranes. 

30 The 32p_iabelled 10 kD DNA fragment zein was used as a hybridization 

probe to screen the library. The fifteen 150-mm nitrocellulose filters carrying the X 
phage plaques were screened using radioactive 10 kD gene probe. After four 
hours prehybridizing at OO^'C in 50XSSPE, 5X Denhardfs, [see Sambrook et al. 
(1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory 
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Press] 0.1% SDS, 100 \Lg/mL calf thymus DNA, the filters were transferred to 
fresh hybridization mix containing the denatmped radiolabeled 10 kD zein gene 
(cpmAnL) and stored overnight at 60®C. They were rinsed the fbUowing day 
under stringent conditions: one hour at room temp in 2XSSC - 0.05% SDS and 
5 one hour at eS^'C in IXSSC - 0.1% SDS. Blotting on 3MM Whatman paper 
followed, then air drying and autoiadiogrq>hy at -TO^C with Kodak XAR-5 films 
with DuPont Cronex® Lightning Plus intensifying screens. From these 
autoradiograms, 20 hybridizing plaques were identified. These plaques were 
picked from the original petri plate and plated out at a dilution to yield about 100 

10 plaques per 80-nun plate. Tliese plaques were absorbed to nitrocellulose filters 
and re-probed using the same procedure. After autoradiognqphy only one of the 
original plaques, number 10, showed two hybridizing plaques. Ihese plaques were 
tested with the probe a third time; all the progeny plaques hybridized, indicating 
that pure clones had been isolated. 

15 DNA was prepared £rom these two phage clones, X 10-1, X 10-2, using the 

protocol for DNA isolation from small-scale liquid Xrphage lysates (Ansul et al. 
(1987) Current Protocols in Molecular Biology, pp. 1.12.2,1.13.5-6). Restriction 
endonnclease digests and agarose gel electrof^oresis showed the two clones to be 
identical* The DNA fragments from the agarose gel were **Soutfiein-blotted" [see 

20 Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory Press] onto nitrocellulose membrane filters and probed with 
radioactively-Iabeled lOkDzcinDNAgoieratedby nicktrati^ation. Asingle 
7.5 kb BamH I fragment and a single 1.4 kb Xba I fragmem hybridized to the 
pirobe. 

25 The 7.5 kb BamH I fragment was isolated from a BamH I digest of the X 

DNA run on an 0.5% low melting point (LMP) agarose gel. The 7.5 kb band was 
excised, melted, and diluted into 0.5 M NaCl and loaded onto a NACS column, 
which was then washed with 0.5 M NaQ, 10 mM Tris-Q, pH 7.2, 1 mM EDTA 
and the fragment eluted witfi 2 M NaQ, 10 mM Tris-Q, pH 7.2, 1 mM EDTA. 

30 This fragment was ligated to the phagemid pTZISR (Pharm^ia) which had been 
cleaved widi BamH I and treated with calf intestinal alkaline phosphatase [see 
Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory Press] to prevent ligation of the phagemid to itself. Subclones 
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with these fragments in both orientations with respect to the pTZlSR DNA were 
obtained following transformation of E. coli . 

An Xba I digest of the cloned k phage DNA was run on an 0.8% agarose 
gel and a 1 .4 kb fragment was isolated using DEAE cellulose membrane (same 
5 procedure as for the PCR-generated 10 kD zein DNA fragment described above). 
This fragmem was ligated to pTZlSR cut with Xba I in the same way as described 
above. Subclones with these fragments in both orientations with respect to the 
pTZlSR DNA, designated pX8 and pXlO, were obtained foDowing transformation 
of E. coli . Single-stranded DNAs were made from the subclones using the 
10 protocol provided by Pharmacia. The entire 1 .4 kb Xba I fragments were 

sequenced. An additional 700 bases adjacent to the Xba I fragment was sequenced 
from die BamH I fragment in clone pB3 (fragment pB3 is in the same orientation 
as pX8) giving a total of 2123 bases of sequence (SEQ ID NO:9). 

Encoded on diis fragment is anotfier methionine-rich zein, which is related 
IS to the 10 kD zein and has been designated High Sulfur Zein (HSZ) [see PCTAJS 
9^00958]. From the deduced amino acid sequence of the protein, its molecular 
weight is sQyprcximately 21 kD and it is about 38% methionine by weigjit 

EXAMPLES 
Modification of tfie HSZ Gene by 
20 Site-Directed Mutagenesis 

Three Nco I sites were present in the 1.4 kD Xba I fragment carrying the 
HSZ genef all in the HSZ coding region. It was desirable to maintain only one of 
these sites (nucleotides 7S1-7S6 in SEQ ID NO:9) that included the translation 
Stan codon. Therefore, the Nco I sites at positions 870-87S and 1333-1338 were 
2S eliminated by oligonucleotide-directed site-specific mutagenesis [see Sambrook et 
al. (1989) Molecular Qoning: A Laboratory Manual, Cold Spring Harbor 
Laboratory Press]. The oligonucleotides synthesized for die mutagenesis were: 

CFC99 ATGAACXXnr <3G ATGC A (SEQ ID NO: 10) 

30 

CFC98 CCC AC AGC AA TGGCGAT (SEQIDNO:ll) 



Mutagenesis was carried out using a kit purchased from Bio-Rad (Richmond, CA), 
following the protocol provided by the vendor. 
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The process changed the A to T at 872 and the C to A at 1334. These 
were both at the third position of their respective codons and resulted in no change 

in tlie amino acid sequence encoded by the gene, with C C A to £ C T, sdll coding 

f 

for Pro and G C C to C A, still coding for Ala. The plasmid clone containing 
5 the modified HSZ gene with a single Nco I site at the ATG start codon was 
designated pX8m. Because the native HSZ gene has a unique Xba I site at the 
stop codon of the gene (1384-1389, SEQ ID NO:9), a complete digest of the 
DNA with Nco I and Xba I yields a 637 bp fragment containing the entire coding 
sequence of the precursor HSZ polypeptide (SEQ ID NO; 12). 
10 It was desirable to create a fonn of the HSZ gent with alternative unique 

restriction mdonudease sites just past the end of the coding region. To do tfiis 
oligonucleotides CFC104 (SEQ ID NO:13) and C3FC105 (SEQ ID NO:14): 

CTC104 5'.CrAG<:XX:GGGTAC -3* (SEQ ID NO: 13) 
15 CFC105 3 - GGGCXX:ATGGATC"5' (SEQ ID NO: 14) 

were annealed and ligated into the Xba I site, introducing two new restriction sites, 
Sma I and Kpn I, and destroying the Xba I site. The now unique Xba I site £com 
nucleotide 1-6 in SEQ ID NO:9 and the Ssp I site &om nucleotide 1823-1828 in 

20 SEQ ED NO:9 were used to obtain a fragment that included the HSZ coding 
legion fdus its 5' and 3* regulatory regions. This finagment was cloned into the 
coQunercially-available vector pTZ19R (Pharmacia) digested with Xba I and 
Sma I, yielding plasmid pCClO. 

It was desirable to create an altered form of the HSZ gene with a unique 

25 xestriction endonuclease site at the start of the mature protein, i.e., with the amino 
terminal signal sequence removed. To accomplish tiiis a DNA fragment was 
goierated using PGR as described in Example 1. Template DNA for the PGR 
reaction was plasmid pX8m. Oligonucleotide primers for the reaction were: 

30 CFC106 y-CCACT TCATGA CCXiATATCCCIAGGGCACTT-g* (SEQ ID 
NO;15) 



CFC88 5'-TTCT ATCTAGA ATGCAGCACCAACAAAGGG-3' (SEQ ID 
NO: 16) 
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The CFC106 (SEQ ID NO: 15) oligonucleotide provided the PC3l-gencratcd 
fragment with a BspH I site (underlined), which when digested with BspH I results 
in a cohesive-end identical to that generated by an Nco I digest. This site was 
located at the junction of the signal sequence and the mature HSZ coding 
sequence. The CFC88 (SEQ ID NO: 16) oligonucleotide provided the PCR- 
generated fragment with an Xba I site (underlined) at the translatim terminus of 
the HSZ getic. The BspH I-Xba I fragment (SEQ ID NO: 17) obtained by 
digestion of the PCR-gencrated fragment, encodes the mature form of HSZ with 
the addition of a methionine residue at the amino terminus of the protein to peimit 
initiation of translation. 

Construction of Chimeric Crenes for 
Expression of Com CS. E. coli AKin*M4. 
and HSZ proteins in the Embrvo and Endosperm 
of Transfonned Com 
The following chtmexic gaies were made for transformation into com: 

globulin 1 promotcr/mcts^ysC-M4/globulin 1 3' legion 
globulin 1 promoter/com CS coding region/globulin 1 3' region 
20 glutelin 2 promotcrAncts^r^-M4/NOS 3' region 

glutelin 2 promoter/com CS coding regiop/10 kD 3' region 
10 kD promoter/HSZ coding regioo/lO kD 3' region 
glutelin 2 promoter/HSZ coding region/10 kD 3' region 

25 A gene expression cassette emplo3dng the 10 kD zein regulatory sequences 

includes about 925 nucleotides upstream (5*) from the translation initiation codon 
and about 945 nucleotides downstream (3*) from tfie translation stop codon. The 
entire cassette is flanked by an EcoR I site at the end and BamH 1^ Sal I and 
Hind in sites at the 3' end. The DNA sequence of these legulatoiy regions have 

30 been described in the literature [Kirihara ct al. (1988) Gene 71:359-370] and DNA 
fragments carrying these regulatory sequences were obtained from com genomic 
DNA via PCR. Between the 5* and 3' regions is a tmique Nco I site, which 
includes the ATG translation initiation codon. The oligonucleotides CFC104 
(SEQ ID NO:13) and CFC105 (SEQ ID NO: 14) (sec Example 5) were inserted at 

35 the Xba I site near the 10 kD zein translation stop codon, thus adding a unique 
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Sma I site. An Nco I-Sma I fragment containing the HSZ coding region was 
isolated from plasmid pCClO (see Example 5) and inserted into Nco I-Sma I 
digested 10 kD zein expression cassette creating the chimeric gene: 10 kD 
piomoter/HSZ coding rcgion/10 kD 3* region. 
5 The glutelin 2 promoter was cloned from com gencxnic DNA using PCR 

with primers based on the pniblished sequence [Reina ct al. (1990) Nucleic Acids 
Res. 18:6426-6426]. The promoter fragment includes 1020 nucleotides upstream 
from the ATG translation start codon. An Nco I site was introduced via PCR at 
the ATG start site to allow for direct translational fusions. A BamH I site was 

10 introduced on the 5' end of the promoter. The 1 .02 kb BamH I to Nco I promoter 
fragment was Unked to an Nco I to Hind m fragment canying the HSZ coding 
iegion/10 kD 3* region described above yielding the chimeric gene: gjutelin 2 
promoter/HSZ coding xegion/10 kD 3' region in a plasmid designated pMLlOB. 
The globulin 1 promoter and 3' sequences were isolated from a Oontech 

IS com genomic DNA library using oligonucleotide probes based on the ;mblished 
sequence of die globulin 1 gene (Kriz et al. (1989) Plant PhysioL 91:636]. The 
cloned segment includes the promoter fragment extmding 1078 nucleotides 
upstream frx>m die ATG translation start codon, the entire globulin coding 
sequence including introns and the 3' sequence extending 803 bases from the 

20 translational stop. To allow replacement of the globulin 1 coding sequence with 
odier coding sequences an Nco I site was introduced at the ATG start codon, and 
Kpn I and Xba I sites were introduced following the translational stop codon via 
PGR to create vector pCCSO. There is a second Nco I site widiin die gjobulin 1 
promoter fragment. The globulin 1 gme cassette is flanked by Hind m sites. 

25 Plant amino acid biosyntheric enzymes are known to be localized in the 

chloroplasts and therefore are syndiesized widi a cMoroplast targeting signal. 
Bacterial proteins such as AKJJi have no such signal. A chloroplast transit 
sequence (cts) was therefore fused to die lysC -M4 coding sequmce in the chimeric 
genes described bdow. For com the cts used was based on the the cts of die smaU 

30 subunit of ribulose 1,5-bisphosphaie carboxylase from com [Lebrun et al. (1987) 
Nucleic Acids Res. 15:4360] and is designated mcts. The oligcmucleotides SEQ 
ID NOS:94-99 were synthesized and used to attach the mcts to lvsC -M4. 

Oligonucleotides SEQ ID NO:18 and SEQ ID NO:19, which encode die 
caiboxy temiinal part of the com chloroplast targeting signal, were annealed^ 
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resulting in Xba I and Nco I compatible ends, purified via polyacrylamide gel 
electrophoresis, and insened into Xba I plus Nco I digested pBT492 (see 
Example 3). The insertion of the correct sequence was verified by DNA 
sequencing yielding pBT556. Oligonucleorides SEQ ID NO:20 and SEQ ID 
5 NO:21, which encode die middle part of tfie chloroplast targeting signal, were 
annealed, resulting in Bgl n and Xba I compatible ends, purified via 
polyacrylamide gel electrophoresis, and inserted into Bgl n and Xba I digested 
pBT556. The insertion of the correct sequence was verified by DNA sequencing 
yielding pBT557. Oligonucleotides SEQ ID NO:22 and SEQ ID NO:23, which 

10 encode the amino terminal part of the chloroplast targeting signal, were annealed, 
resulting in Nco I and Afl n compatible ends, purified via polyacrylamide gel 
electrophoresis, and inserted into Nco I and Afl EL digested pBT5S7. The insertion 
of the correct sequence was verified by DNA sequencing yielding pBT558. Thus 
the mcts was fused to the lysC -M4 gene. 

15 To constnict the chimeric gene: globulin 1 

prDmoteiAncts /tysC -M4/globulin 1 3' region an Nco I to Hpa I fi-agment 
containing tftie mcts /lysC -M4 coding sequence was isolated fnsm plasmid pBTSSS 
^and inserted into Nco I jAus Sma I digested pCCSO creating plasmid pBTG63. 

To construct the chimeric gene: giuteiin 2 promoter/mctsi^^C-M4/NOS 3' 

20 region the 1.02 kb BamH I to Nco I giuteiin 2 promoter fragment described above 
was linked to die Nco I to Hpa I fragment containing die mcts/tysC-M4 coding 
sequence described above and to a Sma I to Hind m fragment carrying the NOS 3' 
region creating. 

To construct die diimeric gene: globulin 1 promoter/com CS coding 
25 legioit/globulin 1 3' region a 1482 base pair BspH I fragment containing the com 
CS coding region (see Exanq>le 2) was isolated and insened into an Nco I partial 
digest of pCC50. A plasmid designated pML157 carried the CS coding region in 
the proper orientation to create the indicated diimeric gene, as determined via 
restriction endonudease digests. 
30 To construct the chimeric gene: giuteiin 2 promoter/com CS coding 

region/10 kD 3' region the HSZ coding region was removed from pML103 
(above) by digestion with Nco I and Xma I and insertion of an oligonucleotide 
adaptor containing an EcoR I site and Nco I and Xma I stidcy ends. The resulting 
plasmid was digested with Nco I and the 1482 base pair BspH I fragment 
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containing the com CS coding region (see above and Example 2) was inserted. A 
plasmid designated pML 159 with the CS coding region in the proper orientation, 
as determined via restriction endonuclease digests, was obtained, creating the 
indicated chimeric gene. 
5 A com CS gene that contained the entire chioroplast targeting signal was 

constructed by fusing the 5' end of the genomic CS gene to the 3' end of the 
cDNA. A 697 bp Nco I to Sph I genomic DNA fragment (see SEQ ID NO:26) 
replaced tfie analogous Nco I to Sph I fragment in die cDNA. Thus, the iBrst 168 
amino acids arc encoded by ttie genomic CS sequence and the coding sequence is 

10 interrupted by two introns. The remaining 341 amino acids are encoded by cDNA 
CS sequence witfi no further introns, resulting in a protein of 509 amino acids in 
length (SEQ ED NO:26). A 1750 bp Nco I to BspH I DNA fragment riiat includes 
tfie entire CS coding region was inserted into the com embryo arnl endosperm 
expression cassettes resulting in the chimeric genes globulin 1 promoter/com CS 

15 coding region/globulin 1 3' region in plasmid pFSl 198 and glutelin 2 

promoter/com CS coding rcgion/10 kD zein 3' region in plasmid pFSl 196, 
respectively. 

EXAMPLE 7 
Isolflrion of tfie E. coli metL Gene and 
20 Constmcticm of Chimeric Genes for Expression 

in the Embrvo f nH Knd osperm of Transfor med Com 
The metL gene of £. encodes a bifimctional protein, AKII-HDHII; the 
AK and HDH activities of this enzyme arc insensitive to all patfiway end-products. 
The metL gene of E. coli has been isolated and sequenced previously [Zaldn et al. 
25 (1983) J. Biol. Chcm. 258:3028-3031]. For the present invention a DNA fragment 
containing tfie metL gene was isolated and modified from E. coli gencxnic DNA 
obtained fix>m strain LE392 using PCR. The following PCR primers were 
designed and synthesized: 

30 CF23 = SEQ ID NO:24: 

y -GAAACCATGG CCAGTGTGAT TGCGCAGGCA 

CF24 = SEQ ID NO:25: 

5'-GAAAGGTACC TTACAACAAC TGTGCCAGC 

35 
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These primers add an Nco I site which includes a translarion initiation codon at the 
amino tenninus of the AKII-HDHII protein. In order to add the restriction site 
and additional codon, GCC coding for alanine, was also added to the amino 
tenninus of die protein. The primers also add a Kpn I site immediately following 
the translation stop codon. 

PCR was performed using a Perkin*Elmer Cetus kit according to the 
instructions of the vendor on a thennocycler manufactured by the same ccKnpany . 
The primers were at a concentration of 10 |jlM and the thermocycUng conditions 
were: 

94** 1 min, SO*' 2 min, 72** 8 min for 10 cycles followed by 
94** 1 min, 72*^8 min for 30 cycles. 

Reactions with four difiFerent concentrations of template DNA all yielded the 
15 expected 2.4 kb DNA fragment, along with several other smaller fragments. The 
four PCR reaction mixes were pooled, digested with Nco I and Kpn I and Ate 
2.4 kb fragments were purified and isolated from an agarose gel. The fragmrat 
was inserted into a modified pBT430 expression vector (see Example 2) 
containing a Kpn I site downstream of the Nco I site at the translation initiation 
20 codon. DNA was isolated from 8 clones carrying the 2,4 kb firagment in the 
pBT430 expression vector and transformed into tfie expression host strain 
BL21(DE3). 

Cultures were grown in TB medium containing ampicillin (100 mg/L) at 
37*'C overnight. The cdls were coUected by centrifugation and resuspended in 
25 l/25th the original culture volume in 50 mM NaCl; 50 mM Tris-Cl, pH 7.5; 1 mM 
EDTA, and frozen at -20®C, tiiawed at 37®C and sonicated, in an ice-water bath, 
to lyse the cells. The lysate was centrifiiged at 4**C for 5 min at 12,000 rpm. The 
sixpematant was removed and the pellet was resuspended in the above buffer. 
The supernatant fractions were assayed for HDH enzyme activities to 
30 identify clones expressing functional proteins. HDH activity was assayed as shown 
below: 
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HDH ASSAY 



Stock solutions 


1.0 ml 


0.20 ml 


Final cone 


0.2MKPO4,pH7.0 


500^11 


100 pi 


100 mM 


3.7 MKQ 


270 mJ 


54 


1.0 M 


0.5 M EDTA 


20»ll 


4^1 


10 mM 


1.0MMga2 


10^1 


2fil 


10 mM 


2iiiMNADPH 


100 fll 


20 pi 


0.20 mM 



Make Mixture of above reagents with amounts multiplied by number of assays. 
Use 0.9 mis of mix for 1ml assay; 180 ^1 of mix for 0.2 ml assay in microliter dish 

Add 

l.OMASAinl.ONHa Ijil 0.2jU l.OmM 

to 1/2 the assay mix; remaining 1/2 lades ASA to serve as blank 
enzyme extract 10-100 pi 2 20 \il 

H2O to 1 .0 ml to 0-20 ml 

Add enzyme extract last to start reaction. Incubate at -30®C; monitor 
NADPH oxidation at 340 nM. 1 unit oxidizes 1 ^mol NADPHAnin at 30**C in the 
1 ml reaction. 

5 Four of ci^ extracts showed HDH activity well above ttie control. These 

four were then assayed for AK activity. AK activity was asss^ed as shown below: 

AK ASSAY 

Assay mix (for 12X1 .OmL or 48 X 0.25mL assays): 
2.5mlsH20 
2.0mls4MKOH 
2.0 mis 4M NH20H-Ha 
1.0 mis IM Tris-Ha pH 8.0 
0.5 mis 0.2M ATP (121 mg/ml in 0.2M NaOH) 
50XinlslMMgSO4 
pH of assay mix should be 7-8 

Each 1.5 ml eppendorf assay tube contains: 

MACRO assav 
assay mix 0.64 mis 

0.2M L-Aspartate 0.04 mis 

extract 5-120 ^I 



10 



15 



micro qggay 
0.16 mis 
0.01 mis 
1-30 jil 
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H2O to total vol. 0.8 mis 

Assay tubes are incubated at 30**C for 30-60 min 
Add to develop color; 
FeOs Tcagmt 0.4 mis 

FeQs reagmt is: 10% w/v FcQs 



50k 

15.5 g 

35 mis HQ 

H2O ro 500 mis 



0.1 mis 



0*2 mis 



3.3% TCA 



0.7% Ha 



10 



15 



20 



Spin for 2 min in eppendorf centrifuge tube. 
Read OD at 540 nm. 

Two extracts also had high levels of AK enzyme activity. These two 
extracts were then tested for inhibition of AK or HDH activity by the pathway 
end-products, lys, thr and met. Neither the AK nor tfie HDH activity of the 
extract from clone 5 was inhibited by 30 mM concentrations of any of the end- 
products. 

The supernatant and pellet fractions of several of the extracts were also 
analyzed by SDS polyaciylamide gel electrophoresis. In the extract firom clone 5, 
die major protein visible by Coomassie blue staining in bodi die pellet and 
scqpematant fractions had a molecular wei^ of about 85 kd, the expected size for 
AKn-HDHn. The metL gem in plasmid pBT718 from clme 5 was used for all 
subsequent wodE:. 

Plant amino acid biosynthetic enzymes are known to be localized in the 
chloroplasts and therefore are synthesized with a cMoroplast targeting signal. 
Bacterial proteins have no such signal. A chloroplast transit sequence (cts) was 
therefore fused to die metL coding sequence in the chimeric genes described 
below. For com the cts used was based on the the cts of the small subunit of 
ribuiose 1^-bisphosphate carboxylase from com [Lebnm et al. (1987) Nucleic 
Acids Res. 15:4360] and is designated mcts. 

Oligonucleotides SEQ ID NO;I8 and SEQ ID NO: 19, which encode the 
caiboxy terminal part of die com chloroplast targeting signal, were armealed, 
resulting in Xba I and Nco I compatible ends, purified via polyacrylamide gel 
electrophoresis, and inserted into Xba I plus Nco 1 digested pBT718. The 
insertion of die correct sequence was verified by DNA sequencing yielding 
pBT725. To complete the com chloroplast targeting signal, pBT725 was digested 
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with Bgl n and Xba I, and a 1. 14 kb BamH I to Xba I firagment from pBT580 
containing the glutelin 2 promoter plus the amino terminal part of the com 
chloroplast targeting signal was inserted creating pBT726. 
To construct the chimeric g«ie: 
5 globulin 1 promoterAncts /metL/ globulin 1 3' region 

the 2.6 kb Nco I to Kpn I firagment containing the mcts/metL coding sequence was 
isolated from plasmid pBT726 and inserted into Nco I plus Kpn I digested pCC50 
creating plasmid pBT727. 

To construct the chimeric gene: 
10 glutelin 2 promotcr/mcts /metL/ NOS 3' region 

the 2.6 kb Nco I to Kpn I fragment containing the mcts/med. coding sequence was 
isolated £rom plasmid pB7726 and linked to the 1 .02 kb BamH I to Nco I glutelin 
2 promoter fragment described in Example 6 and to a Kpn I to Hind m fragment 
carrying the NOS 3* region creating plasmid pBT728. 
15 EXAMPLE 8 

Transftmnatiop of Corn with Chim tprir rVi >es for 
Exprpgsipn of Com CS wd g> cpli 
in the Embryo and Endosperm 
Com was transfrxraied with the c^himeric genes: 
20 globulin 1 promoter/mcts/iag^globiilin 1 3' region (in pBT727) 

globulin 1 promoter/com CS coding region/globulin 1 3* region (in pFS1198) 

glutelin 2 promotcrAncts/meO^OS 3* region (in pBT728) 

glutelin 2 promoter/com CS coding region/10 kD 3' region (in pFS 1 196) 

The bacterial 1^ gene from Streptomvces hvproscopicus that confers 
25 resistance to the herbicide glufosinate [Thompson et al. (1 987 The EMBO Journal 
6:2519-2523] was used as the selectable marker for com transformation. The bar 
gene had its translation codon changed from GTG to ATG for proper translation 
initiation in plants [De Block et al. (1987) The EMBO Journal 6:2513-2518]. The 
bar gene was driven by the 35S promoter from Cauliflower Mosaic Virus and uses 
30 the termination and polyadenylation signal from die octopine synthase gene from 
Agrobacterium tumefaciens . 

Embiyogenic callus cultures were initiated from immature embryos (about 
1 .0 to 1 .5 mm) dissected from kernels of a com line bred for giving a "type n 
callus" tissue culture response. The embryos were dissected 10 to 12 d afrer 
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pollination and were placed with the axis-side down and in contact with agarose- 
solidified N6 medium [Chu et al. (1974) Sci Sin 18:659-668] supplemented with 
1.0 mg/L 2,4-D (N6-I.0). The embryos were kept in the dark at 27^C. Friable 
embiyogenic callus consisting of undifferentiated masses of cells with somatic 
5 proembiyos and somatic embryos borne on suspensor structures proliferated fiom 
tfie scutellum of the immature embiyos. Qonal embryogenic calli isolated from 
individual embryos were identified and sub-cultured on N6-L0 medium every 2 to 
3 weeks. 

The particle bombardment method was used to transfer genes to the callus 

10 culture cells. A Biolistic PDS-lOOO/He (BioRAD Laboratories, Hercules, CA) 
was used for these c»q)eriments. 

Circular plasmid DNA or DNA i^iiich had been linearized by restriction 
end(Hiuclease digestion was precq>itated onto the surface of gold particles. DNA 
from two or three different plasmids, one containing the selectable maricer for com 

15 transformation, and one or two containing the chimeric genes for increased 

methionine accumulation in seeds were co-precq)itated. To accomplish diis 2.S ^g 
of each DNA (in water at a concentration of about 1 mg/mL) was added to 25 fiL 
of gold particles (average diameter of 1.0 pm) suspended in water (60 mg of gold 
per mL). Calcium chloride (25 |iL of a 2.5 M soluticm) and spennidine (10 of 

20 a 0, 1 M solution) were thai added to the gold-DNA suspension as the tube was 
vortcxing for 3 min. The gold particles were centrifuged in a microfiige for 1 sec 
and the supernatant removed. The gold particles were then resuspended in 1 mL 
of absolute ethanol, were centrifuged again and the supernatant removed. Finally, 
the gold particles were resuspended in 25 |iL of absolute ethanol and sonicated 

25 twice for one sec. Five pL of flic DNA-coated gold particles were then loaded on 
each macro carrier disk and the ethanol was allowed to evaporate away leaving the 
DNA-covered gold particles dried onto the disk. 

Embryogenic callus (from die callus Une designated #LH 132.5 JC, 
#LH132,6.X, or #LH132.7 JQ was arranged in a circular area of about 4 cm in 

30 diameter in the center of a 100 X 20 mm petri dish containing N6-1.0 medium 

supplemented with 0.25M sorbitol and 0.25M marmitol. The tissue was placed on 
this medium for 4-6 h prior to bombardment as a pretreatment and remained on the 
medium during the bombardment procedure. At the end of the 4-6 h pretreatment 
period, the petri dish containing the tissue was placed in the chamber of Ae 
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PDS-lOOO/He. The air in the chamber was then evacuated to a vacuum of 28- 
29 inch of Hg. The macrocarricr was accelerated with a helium shock wave using 
a rupture membrane that bursts when the He pressure in the shock tube reaches 
1080-1 100 psi. The tissue was placed approximately 8 cm from the stopping 
5 screen. Five to seven plates of tissue were bombarded with the DNA-coated gold 
particles. Following bombardment, the callus tissue was transferred to N6-1 .0 
medium without supplemental sorbitol or mannitol. 

Within 3-5 days after bombardment the tissue was transferred to selective 
medium, N6-1.0 medium that contained 2 mg/L bialaphos. All tissue was 
10 transferred to fresh N6-1 .0 medium supplemented with bial£q>hos every 2 weeks. 
After 6-12 weeks clones of actively growing callus were identified. Callus was 
then transferred to an MS-based medium that promotes plant regeneraticm. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: E. I. DU PONT DE NEMOURS AND 
COMPANY 

(ii) TITLE OF INVENTION: NUCLEIC ACID FRAC^IENTS, 

CHIMERIC GENES AND 
METHODS FOR INCREASING 
THE METHIONINE CONTENT 
OF THE SEEDS OF PLANTS 

(iii) NUMBER OF SEQUENCES: 27 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: E. I- DU PONT DE NEMOURS AND COMPANY 

(B) STREET: 1007 MARKET STREET 

(C) CITY: WILMINGTON 

(D) STATE: DELAWARE 

(E) COUNTRY: U.S.A. 

(F) ZIP: 19898 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: DISKETTE, 3.50 INCH 

(B) COMPUTER: IBM 

<C) OPERATING SYSTEM: PC-DOS/MS-DOS 
(D) SOFTWARE: MICROSOFT WORD, 2 . OC 

<vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: BARBARA C. SZEGELL 

(B) REGISTRATION NUMBER: 30,684 

(C) REFERENCE/DOCKET NUMBER: BB-1059-A 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 302-992-4931 

(B) TELEFAX; 302-892-7949 

(C) TELEX: 835420 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1639 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ai) MOLECULE TYPE: DNA (genomic) 

<ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2.. 1441 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

G AAT TCC GGC TCG AAG CCG CCG CGA CCG AAC GAG CGA AGC GTC CCT 46 
Asn Ser Gly Ser Lys Pro Pro Arg Pro Asn Glu Arg Ser Val Pro 
15 10 15 

TCC CGC GCC GAC GCC GAA ACC CTA GCT CCT CTT ACG CCA TGG CCA CCG 94 
Ser Arg Ala Asp Ala Glu Thr Leu Ala Pro Leu Thr Pro Trp Pro Pro 
20 25 30 

TGT CGC TCA CTC CGC AGG CGG TCT TCT CCA CCG AGT CCG GCG GCG CCC 142 
Cys Arg Ser Leu Arg Arg Arg Ser Ser Pro Pro Ser Pro Ala Ala Pro 
35 40 45 

TGG CCT CTG CCA CCA TCC TCC GCT TCC CGC CAA ACT TCG TCC GCC TCC 190 
Trp Pro Leu Pro Pro Ser Ser Ala Ser Arg Gin Thr Ser Ser Ala Ser 
50 55 60 

GCG GCG GCG GAT GTC AGC GCA ATT CCT AAC GCT AAG GTT GCG CAG CCG 238 
Ala Ala Ala Asp Val Ser Ala lie Pro Asn Ala Lys Val Ala Gin Pro 
65 70 75 

TCC GCC GTC GTA TTG GCC GAG CGT AAC CTG CTC GGC TCC GAC GCC AGC 286 
Ser Ala Val Val Leu Ala Glu Arg Asn Leu Leu Gly Ser Asp Ala Ser 
80 85 90 95 

CTC GCC GTC CAC GCG GGG GAG AGG CTG GGA AGA AGG ATA GCC ACG GAT 334 
Leu Ala Val His Ala Gly Glu Arg Leu Gly Arg Arg lie Ala Ttir Asp 
100 105 110 

GCT ATC ACC ACG CCG GTA GTG AAC ACG TCG GCC TAC TGG TTC AAC AAC 382 
Ala lie Thr Thr Pro Val Val Asn Thr Ser Ala Tyr Trp Phe Asn Asn 
115 120 125 

TCG CAA GAG CTA ATC GAC TTT AAG GAG GGG AGG CAT GCT AGC TTC GAG 430 
Ser Gin Glu Leu He Asp Phe Lys Glu Gly Arg His Ala Ser Phe Glu 
130 135 140 



TAT GGG AGG TAT GGG AAC CCG ACC ACG GAG GCA TTA GAG AAG AAG ATG 
Tyr Gly Arg Tyr Gly Asn Pro Thr Thr Glu Ala Leu Glu Lys Lys Met 
145 150 155 



478 
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AGC GCA CTG GAG AAA GCA GAG TCC ACC GTG TTT GTG GCG TCA GGG ATG 526 
Ser Ala Leu Glu Lys Ala Glu Ser Thr Val Phe Val Ala Ser Gly M t 
160 165 170 175 

TAT GCA GCT GTG GCT ATG CTC AGC GCA CTT GTC CCT GOT GGT GGG CAC 574 
Tyr Ala Ala Val Ala Met Leu Ser Ala Leu Val Pro Ala Gly Gly His 
160 185 190 

ATT GTG ACC ACC ACG GAT TGC TAC CGC AAG ACA AGG ATT TAC ATG GAA 622 
lie Val Thr Thr Thr Asp Cys Tyr Arg Lys Thr Arg lie Tyr Met Glu 
195 200 205 

AAT GAG CTC CCT AAG AGG GGA ATT TCG ATG ACT GTC ATT AGG CCT GCT 670 
Asn Glu Leu Pro Lys Arg Gly lie Ser Met Thr Val lie Arg Pro Ala 
210 215 220 

GAC ATG GAT GCT CTC CAA AAT GCC TTG GAC AAC AAT AAT GTA TCT CTT 718 
Asp Met Asp Ala Leu Gin Asn Ala Leu Asp Asn Asn Asn Val Ser Leu 
225 230 235 

TTC TTC ACG GAG ACT CCT ACA AAT CCA TTT CTC AGA TGC ATT GAT ATT 766 
Phe Phe Thr Glu Thr Pro Thr Asn Pro Phe Leu Arg Cys lie Asp lie 
240 245 250 255 

GAA CAT GTA TCA AAT ATG TGC CAT AGC AAG GGA GCG TTG CTT TGT ATT 814 
Glu His Val Ser Asn Met Cys His Ser Lys Gly Ala Leu Leu Cys lie 
260 265 270 

GAC AGT ACT TTC GCG TCA CCT ATC AAT CAG AAG GCA TTA ACT TTA GGT 862 
Asp Ser Thr Phe Ala Ser Pro lie Asn Gin Lys Ala Leu Thr Leu Gly 
275 280 265 

GCT GAC CTA GTT ATT CAT TCT GCA ACG AAG TAC ATT GCT GGA CAC AAT 910 
Ala Asp Leu Val lie His Ser Ala Thr Lys Tyr He Ala Gly His Asn 
290 295 300 

GAT GTT ATT GGA GGA TGC GTC AGT GGC AGA GAT GAG TTA GTT TCC AAA 958 
Asp Val He Gly Gly Cys Val Ser Gly Arg Asp Glu Leu Val Ser Lys 
305 310 315 

GTT CGT ATT TAC CAC CAT GTA GTT GGT GGT GTT CTA AAC CCG AAT GCT 1006 
Val Arg He Tyr His His Val Val Gly Gly Val Leu Asn Pro Asn Ala 
320 325 330 335 

GCG TAC CTT ATC CTT CGA GGT ATG AAG ACA CTG CAT CTC CGT GTG CAA 1054 
Ala Tyr Leu He Leu Arg Gly Met Lys Thr Leu His Leu Arg Val Gin 
340 345 350 

TGT CAG AAC GAC ACT GCT CTT CGG ATG GCC CAG TTT TTA GAG GAG CAT 1102 
Cys Gin Asn Asp Thr Ala Leu Arg Met Ala Gin Phe Leu Glu Glu His 
355 360 365 



CCA AAG ATT GCT CGT GTC TAC TAT CCT GGC TTG CCA AGT CAC CCT GAA 1150 
Pro Lys He Ala Arg Val Tyr Tyr Pro Gly Leu Pro Ser His Pro Glu 
370 375 380 
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GAT CAC ATT GCC AAG AGT CAA ATG ACT GGC TTT GGC GGT GTT GTT AGT 1198 
His His lie Ala Lys Ser Gin Met Thr Gly Ph Gly Gly Val Val Ser 
3B5 390 395 

TTT GAG GTT GCT GGA GAC TTT GAT GCT ACG AGG AAA TTC ATT GAT TCT 1246 
Phe Glu Val Ala Gly Asp P he' Asp Ala Thr Arg Lys Phe lie Asp Ser 
400 405 410 415 

GTT AAA ATA CCC TAT CAT GCG CCT TCT TTT GGA GGC TGT GAG AGC ATA 1294 
Val Lys lie Pro Tyr His Ala Pro Ser Phe Gly Gly Cys Glu Ser He 
420 425 430 

ATT GAT CAG CCT GCC ATC ATG TCC TAG TGG GAT TCA AAG GAG GAG CGG 1342 
He Asp Gin Pro Ala He Met Ser Tyr Trp Asp Ser Lys Glu Gin Arg 
435 440 445 

GAC ATC TAC GGG ATC AAG GAC AAC CTG ATC AGG TTC AGC ATT GGT GTG 1390 
Asp He Tyr Gly He Lys Asp Asn Leu He Arg Phe Ser He Gly Val 
450 455 460 

GAG GAT TTC GAG GAT CTT AAG AAC GAT CTC GTG CAG GCC CTC GAG AAG 1438 
Glu Asp Phe Glu Asp Leu Lys Asn Asp Leu Val Gin Ala Leu Glu Lys 
465 470 475 

ATC TAA GCACTCTAAT CAGTTTGTAT TGACAAAAT ATGAGGTGAT GGCTGTCTTG 1494 

He 

480 

GATCTTGTCA AGATCT6TGA CAATGATATG AGCTGATGAC TGCGAATAAG 1544 

TTCTCTTTTG CTTATTTTAT CCGTCAAATT CAAAAAAAAA AAAAAAAAAA 1594 

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAC TCGAG 1639 



(2) INFORMATION FOR SBQ ID NO:2; 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 14 bases 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
AATTCATGAG TGCA 14 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 14 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 

(xi> SEQUENCE DESCRIPTION; SEQ ID NO:3: 
AATTTGCACr CATG ^ 14 

<2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1350 base pairs 

(B) TYPE: nucleic acid 

(C) STHANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<ix) FEATURE: 

<A ) NAME/KEY : CDS 

(B) LOCIATION: l.,1350 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ATG GOT C3AA ATT GTT GTC TCC AAA TTT GGC GGT ACC AGC GTA GCT GAT 48 
Met Ala Glu lie Val Val Ser Lys Phe Gly Gly Thr Ser Val Ala Asp 
15 10 15 

TTT GAC GCX: ATG AAC CGC AGC GCT GAT ATT GTG CTT TOT GAT GCC AAC 96 
Phe Asp Ala Met Asn Arg Ser Ala Asp lie Val Leu Ser Asp Ala Asn 
20 25 30 

GTG CGT TTA GTT GTC CTC TCG GCT TCT GCT GGT ATC ACT AAT CTG CTG 144 
Val Arg Leu Val Val Leu Ser Ala Ser Ala Gly lie Thr Asn Leu Leu 
35 40 45 

GTC GCT TTA GCT GAA <3GA CTG GAA CCT GGC GAG CGA TTC GAA AAA CTC 192 
Val Ala Leu Ala Glu Gly Leu Glu Pro Gly Glu Arg Phe Glu Lys Leu 
50 55 60 

GAC GCT ATC CGC AAC ATC CAG TTT GCC ATT CTG (3AA CGT CTG CGT TAC 240 
Asp Ala lie Arg Asn lie Gin Phe Ala lie Leu Glu Arg Leu Arg Tyr 
65 70 75 80 

CCG AAC GTT ATC CGT GAA GAG ATT GAA CGT CTG CTG GAG AAC ATT ACT 288 
Pro Asn Val He Arg Glu Glu He Glu Arg Leu Leu Glu Asn He Thr 
85 90 95 

GTT CTG GCA GAA GCG GCG GCG CTG GCA ACG TCT CCG GCG CTG ACA GAT 336 
Val Leu Ala Glu Ala Ala Ala Leu Ala Thr Ser Pro Ala Leu Thr Asp 
100 105 110 

GAG CTG GTC AGC CAC QGC GAG CTG ATG TCG ACC CTG CTG TTT GTT GAG 384 
Glu Leu Val Ser His Gly Glu Leu Met Ser Thr Leu Leu Phe Val Glu 
115 120 125 



wo 95/3 1 554 ^^JCT/US95y05545 



55 



ATC CTG CGC GAA CGC GAT GTT CAG GCA CAG TGG TTT GAT GTA CGT AAA 432 
He Leu Arg Glu Arg Asp Val Gin Ala Gin Trp Phe Asp Val Arg Lys 
130 135 140 

GTG ATG CGT ACC AAC GAG CGA TTT GGT CGT GCA GAG CCA GAT ATA GCC 480 
Val Met Arg Thr Asn Asp Arg Phe Gly Arg Ala Glu Pro Asp He Ala 
145 150 155 160 

bCG CTG GCG GAA CTG GCC GCG CTG CAG CTG CTC CCA CGT CTC AAT GAA 528 
Ala IfCU Ala Glu Leu Ala Ala Leu Gin Leu Leu Pro Arg Leu Asn Glu 
165 170 175 

GGC TTA GTG ATC ACC CAG GGA TTT ATC GGT AGC GAA AAT AAA GGT CGT 576 
Gly Leu Val He Thr Gin Gly Phe lie Gly Ser Glu Asn Lys Gly Arg 
IBO 185 190 

ACA ACG ACG CTT GGC CGT GGA GGC AGC GAT TAT ACG GCA GCC TTG CTG 624 
Thr Thr Thr Leu Gly Arg Gly Gly Ser Asp Tyr Thr Ala Ala Leu Leu 
195 200 205 

GCG GAG GCT TTA CAC GCA TCT CGT GTT GAT ATC TGG ACC GAC GTC CCG 672 
Ala Glu Ala Leu His Ala Ser Arg Val Asp He Trp Thr Asp Val Pro 
210 215 220 

GGC ATC TAC ACC ACC GAT CCA CGC GTA GTT TCC GCA GCA AAA CGC ATT 720 
Gly He Tyr Thr Thr Asp Pro Arg Val Val Ser Ala Ala Lys Arg He 
225 230 235 240 

GAT GAA ATC GCG TTT GCC GAA GCG GCA GAG ATG GCA ACT TTT GGT GCA 766 
Asp Glu He Ala Phe Ala Glu Ala Ala Glu Met Ala Thr Phe Gly Ala 
245 250 255 

AAA GTA CTG CAT CCG GCA ACG TTG CTA CCC GCA GTA CGC AGC GAT ATC 816 
Lys Val Leu His Pro Ala Thr Leu Leu Pro Ala Val Arg Ser Asp He 
260 265 270 

CCG GTC TTT GTC GGC TCC AGC AAA GAC CCA CGC GCA GGT GGT ACG CTG 864 
Pro Val Phe Val Gly Ser Ser Lys Asp Pro Arg Ala Gly Gly Thr Leu 
275 280 285 

GTG TGC AAT AAA ACT GAA AAT CCG CCG CTG TTC CGC GCT CTG GCG CTT 912 
Val Cys Asn Lys Thr Glu Asn Pro Pro I^eu Phe Arg Ala Leu Ala Leu 
290 295 300 

CGT CGC AAT CAG ACT CTG CTC ACT TTG CAC AGC CTG AAT ATG CTG CAT 960 
Arg Arg Asn Gin Thr Leu Leu Thr Leu His Ser Leu Asn Met I«eu His 
305 310 315 320 

TCT CGC GGT TTC CTC GCG GAA GTT TTC GGC ATC CTC GCG CGG CAT AAT 1008 
Ser Arg Gly Phe Leu Ala Glu Val Phe Gly He Leu Ala Arg His Asn 
325 330 335 



ATT 
He 



TCG GTA GAC TTA ATC ACC ACG TCA GAA GTG AGC GTG GCA TTA ACC 1056 
Ser Val Asp Leu He Thr Thr Ser Glu Val Ser Val AJ.a Leu Thr 
340 345 350 
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CTT GAT ACC ACC GOT TCA ACC TCC ACT GGC GAT ACG TTG CTG ACG CAA 1104 
Leu Asp Thr Thr Gly Ser Thr Ser Thr Gly Asp Thr Leu Leu Thr Gin 
355 360 365 

TCT CTG CTG ATG GAG CTT TCC GCA CTG TGT CGG GTG GAG GTG GAA GAA 1152 
Ser Leu Leu Met Glu Leu Ser ASla Leu Cys Arg Val Glu Val Glu Glu 
370 375 380 

GGT CTG GCG CTG GTC GCG TTG ATT GGC AAT GAC CTG TCA AAA GCC TGC 1200 
Gly I/eu Ala Leu Val Ala Leu lie Gly Asn Asp Leu Ser Lys Ala Cys 
385 390 395 400 

GCC GTT GGC AAA GAG GTA TTC GGC GTA CTG GAA CCG TTC AAC ATT CGC 1248 
Ala Val Gly Lys Glu Val Phe Gly Val Leu Glu Pro Phe Asn lie Arg 
405 410 415 

ATG ATT TGT TAT GGC GCA TCC AGC CAT AAC CTG TGC TTC CTG GTG CCC 1296 
Met lie Cys Tyr Gly Ala Ser Ser His Asn Leu Cys Phe Leu Val Pro 
420 425 430 

GGC GAA GAT GCC GAG CAG GTG GTG CAA AAA CTG CAT AGT AAT TTG TTT 1344 
Gly Glu Asp Ala Glu Gin Val Val Gin Lys Leu His Ser Asn I«eu Phe 
435 440 445 

GAG TAA X350 
Glu * 
450 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQXJENCE CHARACTERISTICS: 
(A) LENGTH: 36 bases 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGy: linear 

(ii) MOLECULE TYPE: DNA 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GATCCATGGC TGRAATTGTT GTCTCCAAAT TTGGOG 36 



(2) INFORMATION FOR SEQ ID NO: 6: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



GTACCGCCAA ATTTGGAGAC AACAATTTCA GCCATG 



36 



wo 95/31554 



'CT/US95/05545 



57 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ATGGCAGCCA AGATGCTTGC ATTGTTCGCT 30 



(2) INFORMATION FOR SEQ ID NO: 8: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 bases 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GAATGCAGCA CCAACAAAGG GTTGCTGTJiA 30 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2123 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1113.. 1385 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TCTAGA<K:Crr ATTACCATCT CTACTCACGG GTCGTAGAGG TGGrGAG<5TA 50 

C^GCTACAGCT GGT(SACAATC CTACTTCACCC TTTGTAATCC TCTACGGCTC 100 

TACGCGTAGT TAATTGGTTA GATGTCAACC CCCTCTCTAA GTGGCAGTAG 150 

TGGGCTTGGT TATACCTGCT AGTGCCTC^GG (SATGTTCTAT TTTTCTAGTA 200 

(5TGCTTGATC AAACATTGCA TAGTTTGACT TGGGACAAAC TGTCTGATAT 250 
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ATATATATAT TTTTGGGCAG AGGGAGCAGT AAGAACTTAT TTAGAAATGT 300 

AATCATTTGT TAAAAAAGGT TTAATTTTGC TGCTTTCTTT CGTTAATGTT 350 

GTTTTCACAT TAGATTTTCT TTGTGTTATA TACACTGGAT ACATACAAAT 400 

TCAGTTGCAG TAGTCTCTTA ATCCACATCA GCTAGGCATA CTTTAGCAAA 450 

AGCAAATTAC ACAAATCTAG TGTGCCTGTC GTCACATTCT CAATAAACTC 500 

GTCATGTTTT ACTAAAAGTA CCTTTTCGAA GCATCATATT AATCCGAAAA 550 

CAGTTAGGGA AGTCTCCAAA TCTGACCAAA TGCCAAGTCA TCGTCCAGCT 600 

TATCAGCATC CAACTTTCAG TTTCGCATGT GCTAGAAATT GTTTTTCATC €50 

TACATGGCCA TTGTTGACTG CATGCATCTA TAAATAGGAC CTAGACGATC 700 

AATCGCAATC GCATATCCAC TATTCTCTAG GAAGCAAGGG AATCACATCG 750 
CC 752 

ATG GCA GCC AAG ATG TTT GCA TTG TTT GCG CTC CTA GOT CTT TGT 797 
Met Ala Ala hys Met Phe Ala Leu Phe Ala Leu Leu Ala Leu Cys 
-20 -15 -10 

GCA ACC GCC ACT AGT GCT ACC CAT ATC CCA GGG CAC TTG TCA CCA 642 
Ala Thr Ala Tlir Ser Ala Thr His lie Pro Gly BlB Leu Ser Pro 
-5 15 

CTA CTG ATG CCA TTG GCT ACC ATG AAC CCA TGG ATG CAG TAC TGC 887 
Leu I«eu Met Pro Leu Ala Tbr Met Asn Pro Trp Met Gin Tyr Cys 
10 15 20 

ATG AAG CAA CAG GGG GTT GCC AAC TTG TTA GCG TGG CCG ACC CTG 932 
Met Lys Gin Gin Gly Val Ala Asn I«eu Leu Ala Trp Pro Thr Leu 
25 30 35 

ATG CTG CAG CAA CTG TTG GCC TCA CCG CTT CAG CAG TGC CAG ATG 977 
Met I«eu Gin Gin X<eu Leu Ala Ser Pro Leu Gin Gin Cys Gin Met 
40 45 50 

CCA ATG ATG ATG CCG GGT ATG ATG CCA CCG ATG ACG ATG ATG CCG 1022 
Pro Met Met Met Pro Gly Met Met Pro Pro Met Thr Met Met Pro 
55 60 65 

ATG CCG AGT ATG ATG CCA TCG ATG ATG GTG CCG ACT ATG ATG TCA 1067 
Met Pro Ser Met Met Pro Ser Met Met Val Pro Thr Met Met Ser 
70 75 80 

CCA ATG ACG ATG GCT AGT ATG ATG CCG CCG ATG ATG ATG CCA AGC 1112 
Pro Met Thr Met Ala Ser Met Met Pro Pro Met Met Met Pro Ser 
85 90 95 
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ATG ATT TCA CCA ATG ACG ATG CCG AGT ATG ATG CCT TCG ATG ATA 1157 
Met He Ser Pro Met Thr Met Pro Ser Met Met Pro Ser Met He 
100 105 110 

ATG CCG ACC ATG ATG TCA CCA ATG ATT ATG CCG AGT ATG ATG CCA 1202 
Met Pro Thr Met Met Ser Pro Met He Met Pro Ser Met Met Pro 
115 120 125 



CCA ATG ATG ATG CCG AGC ATG GTG TCA CCA ATG ATG ATG CCA AAC 1247 
Pro Met Met Met Pro Ser Met Val Ser Pro Met Met Met Pro Asn 
130 135 140 

ATG ATG ACA GTG CCA CAA TGT TAC TCT GGT TCT ATC TCA CAC ATT 1292 
Met Met Ttir Val Pro Gin Cys Tyr Ser Gly Ser He Ser His He 
145 150 155 

ATA CHA CAA CAA CAA TTA CCA TTC ATG TTC AGC CCC ACA GCC ATG 1337 
He Gin Gin Gin Gin Leu Pro Phe Met Phe Ser Pro Thr Ala Met 
160 165 170 

GCG ATC CCA CCC ATG TTC TTA CAG CAG CCC TTT GTT GGT GCT GCA 1382 
Ala He Pro Pro Met Phe Leu Gin Gin Pro Phe Val Gly Ala Ala 
175 180 185 

TTC TAG ATCTAGATAT AA 1400 
Phe 
190 

GCATTTGTGT AGTACCCAAT AATGAAGTCG GCATGCCATC GCATACGACT 
CATTGTTTAG GAATAAAACA AGCTAATAAT GACTTTTCTC TCATTATAAC 
TTATATCTCT CCATGTCTGT TTGTGTGTTT GTAATGTCTG TTAATCTTAG 
TAGATTATAT TGTATATATA ACCATGTATT CTCTCCATTC CAAATTATAG 
GTCTTGCATT TCAAGATAAA TAGTTTTAAC CATACCTAGA CATTATGTAT 
ATATAGGCGG CTTAACAAAA GCTATGTACT CAGTAAAATC AAAACGACTT 
ACAATTTAAA ATTTAGAAAG TACATTTTTA TTAATAGACT AGGTGAGTAC 
TTGTGCGTTG CAACGGGAAC ATATAATAAC ATAATAACTT ATATACAAAA 
TGTATCTTAT ATTGTTATAA AAAATATTTC ATAATCCATT TGTAATCCTA 
GTCATACATA AATTTTGTTA TTTTAATTTA GTTGTTTCAC TACTACATTG 
CAACCATTAG TATCATGCAG ACTTCGATAT ATGCCAAGAT TTGCATGGTC 
TCATCATTGA AGAGCACATG TCACACCTGC CGGTAGAAGT TCTCTCGTAC 
ATTGTCAGTC ATCAGGTACG CACCACCATA CACGCTTGCT TAAACAAAAA 



1450 
1500 
1550 
1600 
1650 
1700 
1750 
1800 
1850 
1900 
1950 
2000 
2050 



AACAAGTGTA TGTGTTTGCG AAGAGAATTA AGACAGGCAG ACACAAAGCT 



2100 



wo 95/31554 



PCT/US95/05545 



60 

ACCCGACGAT GGCGAGTCGG TCA 2123 



(2) INFORMATION FOR SEQ ID NO: 10: 

(1) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 17 bases 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS : single 

(D) TOPOI*OGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOllO: 
ATGAACCCTT GGATGCA 17 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 bases 

(B) TYPE: nucleic acid 
<C) STRAKDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOill: 
CCCACAGCAA TGGCGAT 17 



(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 639 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 635 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CC ATG GCA GCC AAG ATG TTT <5CA TTG TTT GCG CTC CTA GCT CTT TGT 47 
Met Ala Ala Lys Met Phe Ala Leu Phe Ala Leu Leu Ala Leu Cys 
-20 -15 -10 

GCA ACC GCC ACT AGT GCT ACC CAT ATC CCA GGG CAC TTG TCA CCA 92 
Ala Thr Ala Thr Ser Ala Thr His lie Pro Gly His Leu Ser Pro 
-5 15 
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CTA CTG ATG CCA TTG GCT ACC ATG AAC CCT TGG ATG GAG TAG TGC 137 
Leu lieu Met Pro Leu Ala Thr Met Asn Pro Trp Met Gin Tyr Cys 
10 15 20 

ATG AAG CAA CAG GGG GTT GCC AAC TTG TTA GCG TGG COG ACC CTG 182 
Met Lys Gin Gin Gly Val Ala Asn Leu Leu Ala Trp Pro Thr Leu 
25 30 35 

ATG CTG CAG CAA CTG TTG GCC TCA CCG CTT CAG CAG TGC CAG ATG 227 
Met Leu Gin Gin Leu Leu Ala Ser Pro Leu Gin Gin Cys Gin Met 
40 45 50 

CCA ATG ATG ATG CCG GGT ATG ATG CCA CCG ATG ACG ATG ATG CCG 272 
Pro Met Met Met Pro Gly Met Met Pro Pro Met Thr Met Met Pro 
55 60 65 

ATG CCG AGT ATG ATG CCA TCG ATG ATG GTG CCG ACT ATG ATG TCA 317 
Met Pro Ser Met Met Pro Ser Met Met Val Pro Thr Met Met Ser 
70 75 80 • 

CCA ATG ACG ATG GCT AGT ATG ATG CCG CCG ATG ATG ATG CCA AGC 362 
Pro Met Thr Met Ala Ser Met Met Pro Pro Met Met Met Pro Ser 
85 90 95 

ATG ATT TCA CCA ATG ACG ATG CCG AGT ATG ATG CCT TCG ATG ATA 407 
Met lie Ser Pro Met Thr Met Pro Ser Met Met Pro Ser Met lie 
100 105 110 

ATG CCG ACC ATG ATG TCA CCA ATG ATT ATG CCG AGT ATG ATG CCA 452 
Met Pro Thr Met Met Ser Pro Met lie Met Pro Ser Met Met Pro 
115 120 125 

CCA ATG ATG ATG CCG AGC ATG GTG TCA CCA ATG ATG ATG CCA AAC 497 
Pro Met Met Met Pro Ser Met Val Ser Pro Met Met Met Pro Asn 
130 135 140 

ATG ATG ACA GTG CCA CAA TGT TAC TCT GGT TCT ATC TCA CAC ATT 542 
Met Met Thr Val Pro Gin Cys Tyr Ser Gly Ser lie Ser His He 
145 150 155 

ATA CAA CAA CAA CAA TTA CCA TTC ATG TTC AGC CCC ACA GCA ATG 587 
He Gin Gin Gin Gin Leu Pro Phe Met Phe Ser Pro Thr Ala Met 
160 165 170 

GCG ATC CCA CCC ATG TTC TTA CAG CAG CCC TTT GTT GGT GCT GCA 632 
Ala He Pro Pro Met Phe Leu Gin Gin Pro Phe Val Gly Ala Ala 
175 180 185 

TTC TAG A 639 

Phe 

190 



(2) INFORMATION FOR SEQ ID NO; 13: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 bases 
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<B) TYPE: nucleic acid 

(C) STRANDEDimsS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DMA 

(xi) SEQOTNCE DESCRIPTION: SEQ ID NO: 13: 
CTAGCCCGGG TAG 13 



{2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 bases 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MDLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 14; 
CTAGGTACCC GGG 13 



(2) INFORMATIC»f FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 30 bases 
(fi) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCACTTCATG ACCCATATCC CAGGGCACTT 30 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DKA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



TTCTATCTAG AATGCAGCAC CAACAAAG<3G 



30 
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<2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 579 t>ase pairs 

(B) TYPE: nucleic acid 

(C ) STRANPEDflESS : s ingle 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

<A) NAME/KEY: CDS 
<B) LOCATION: 3,. 575 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TC ATG ACC CAT ATC CCA GGG CAC TTG TCA CCA CTA CTG ATG CCA TTG 47 
Met Thr His lie Pro Gly His Leu Ser Pro Leu Leu Met Pro Leu 
5 10 15 

GCT ACC ATG AAC CCT TGG ATG CAG TAG TGC ATG AAG CAA CAG GGG 92 
Ala Thr Met Asn Pro Trp Met Gin Tyr Cys Met Lys Gin Gin Gly 
20 25 30 

GTT GCC AAC TTG TTA GCG TGG CCG ACC CTG ATG CTG CAG CAA CTG 137 
Val Ala Asn Leu Leu Ala Trp Pro Thr Leu Met Leu Gin Gin Leu 
35 40 45 

TTG GCC TCA CCG CTT CAG CAG TGC CAG ATG CCA ATG ATG ATG CCG 182 
Leu Ala Ser Pro Leu Gin Gin Cys Gin Met Pro Met Met Met Pro 
50 55 60 

GGT ATG ATG CCA CCG ATG ACG ATG ATG CCG ATG CCG AGT ATG ATG 227 
Gly Met Met Pro Pro Met Thr Met Met Pro Met Pro Ser Met Met 
65 70 75 

CCA TCG ATG ATG GTG CCG ACT ATG ATG TCA CCA ATG ACG ATG GCT 272 
Pro Ser Met Met Val Pro Thr Met Met Ser Pro Met Thr Met Ala 
80 85 90 

AGT ATG ATG CCG CCG ATG ATG ATG CCA AGC ATG ATT TCA CCA ATG 317 
Ser Met Met Pro Pro Met Met Met Pro Ser Met lie Ser Pro Met 
95 100 105 

ACG ATG CCG AGT ATG ATG CCT TCG ATG ATA ATG CCG ACC ATG ATG 362 
Thr Met Pro Ser Met Met Pro Ser Met He Met Pro Thr Met Met 
110 115 120 

TCA CCA ATG ATT ATG CCG AGT ATG ATG CCA CCA ATG ATG ATG CCG 407 
Ser Pro Met He Met Pro Ser Met Met Pro Pro Met Met Met Pro 
125 130 135 

AGC ATG GTG TCA CCA ATG ATG ATG CCA AAC ATG ATG ACA GTG CCA 452 
Ser Met Val Ser Pro Met Met Met Pro Asn Met Met Thr Val Pro 
140 145 150 
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CAA TGT TAC TCT GGT TCT ATC TCA CAC ATT ATA CAA CAA CAA CAA 497 

Gin Cys Tyr Ser Gly Ser lie Ser His He He Gin Gin Gin Gin 
155 160 165 

TTA CCA TTC ATG TTC AGC CCC ACA GCA ATG GCG ATC CCA CCC ATG 542 

Leu Pro Phe Met Phe Ser Pro Thr Ala Met Ala He Pro Pro Met 
170 175 180 

TTC TTA CAG GAG CCC TTT GTT GGT GCT GCA TTC TAG A 579 
Phe lieu Gin Gin Pro Phe Val Gly Ala Ala Phe 
185 190 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTAGAAGCCT CGGCAACGTC AGCAACGGCG GAAGAATCCG GTG 43 



(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 
<D ) TOPOLOGY : linear 

<ii) MOLECULE TYPE: DNA (genomic) 

<£i) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

CATGCACCGG ATTCTTCCGC CGTTGCTGAC GTTGCC6AGG CTT 43 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D ) TOPOLOGY : linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



(3ATCCCAT<5G C(3CCCCTTAA GTCCACCGCC AGCCTCCCCG TCGCCCGCCG CTCCT 55 
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(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LEa^GTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOI*OGy: linear 

(ii> MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CTAGAGGAGC GGCGGGCGAC GGGGAGGCTG GCGGTGGACT TAAGGGGCGC CATGG 55 



(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Si) SEQX7ENCE DESCRIPTION: SEQ ID NO: 22: 

CATGGCGCCC ACCGTGATGA TGGCCTCGTC GGCCACCGCC GTCGCTCCGT TCCAGGGGC 59 



(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQX^ENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(^) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ttaagcccx:t GGAACGGAGC GACGGCGCSTG GCCGACGAGG CCATCATCAC GGTGGGCGC 59 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLE<OTiE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
GAAACCATGG CCAGTGTGAT TGCGCAGGCA 30 

(2) INFORMATION FOR SECS ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GAAAGGTACC TTACAACAAC TGTGCCAGC 29 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 3639 base pairs 
(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 





<ii) MOLECULE TYPE: 


DNA (genomic) 








(Xi) SEQUENCE DESCRIPTION: SEQ 


ID NO:26: 






TCTAGATTAC 


ATAATACACC 


TAATAATCTT 


GTGTTGTTTG 


TTTACTTCTC 


AACTTATTTA 


60 


AGTTGGATTA 


TATTCCATCT 


TTTCTTTTTT 


ATTTGTCTGT 


TTTAGTTAAA 


AATGAACTAA 


120 


CAAACGACAA 


ATATTCGAGA 


ACGAGATAGT 


ATAATCTATA 


GGATAATCAG 


ACATGTCCTT 


180 


AGAGGGTGTT 


TGTTTAGAAT 


TATAATATGT 


ATAGAATATA 


TAATCCAACA 


AATTTTGAAC 


240 


TAACAAGTTT 


AAAATTTGAT 


AGATTATATA 


ATCTGGGCAC 


ATTATAATCC 


TAAACAAACA 


300 


CCATCTTAGT 


AATTTTTTAT 


TTAGTGCTCC 


GTTTGGATGT 


GAAGAAGATG 


GAGTTGAATA 


360 


CCAAATCATG 


TATGATACTG 


AAATGAGATG 


TAATTTTAAT 


TCTATTGTTT 


GGATGTCGTT 


420 


GAATTGGAGT 


TTGAAGTTAT 


GCGGTCTAAT 


TTTACGCAAT 


ACCGAGATGA 


GACTTTATAC 


480 


TAGGAGAGGG 


GTTTCTAGTT 


ATAGCCTAAT 


TCTTAAAGAAT 


TGAGTCTCTA 


TTTCCAAATC 


540 


TTAATTTTAT 


GCAACTAAAC 


AACACAATTT 


AGAAAAACTG 


TTTTCAATTT 


CTTATTCTGT 


600 


GCTCCAAACG 


AGGTGGAGTA 


TTTAGAAGTA 


GATAAGCGCC 


TCTGCTGCAC 


GAAGCGATGA 


660 


ACGCACTCTG 


ACGGTCTTGC 


CACTACAAAT 


AAGCCGCACC 


GCATTTCGGA 


AGGCCACGCG 


720 


ACCGCCACCT 


CCCCGAAGCT 


GCCGCGACCG 


ATCGAGCGAA 


GCGTCGCTCC 


CCGCGCCGCC 


780 
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GCCAAAACCC TAGCTTCTCC TACTCCATGG 
TCTCCACGGA GTCCGGTGGC GCCCTGGCCT 
XTGTCCGCCA GCTTAGCACC AAGGCACGCC 
TCGTCGCCGC CGCGTGGTCC GACTGCCCCG 
GCCGCGCCCG CGGCGTGGCC TCCTCCCACG 
CCGCGGOGGC GGAGGTCAGC GCAArTCCCA 
TCTTGGCCGA GCGTAACCTG CTCGGCTCCG 
TACCCTGCTA GCTCGTCTCT TTACTGTAAG 
ATGATTCCTT TGTGGCTTTG CTGCCTTTTT 
GATCGCCACG GATGCGATCA CCACACCGGT 
CTCGCAAGAG CTAATCGACT TTAAGGTAGT 
ACGGATGTGA GTTTTGACGC CGAAATATTA 
TGCTAGCTTC GAGTATGGGA GGTATGGGAA 
GAGGTGATGC TCGATAGTGG AAATGTCGGC 
AACAGTTGCG TGTTCTCATG GTGCAGCGCA 
GCATCGOGGA TGTATGCA6C TGCG6CTATG 
ATTCTGACCA CCACGGATTG CTACCGGAAA 
AAGAGGGGAA TTTCGGTAAT ACCATGCGAT 
ACATCTGCTA TCACTATTGG TTGTCTTCCT 
GAACTTGCTC TTATTCAGTTr AAAATTACTC 
TCTACAAAGT TCAGTTACTT CAGCATAGCC 
ATGAACAATA TCCTTTGCAG TAGCTGTTGG 
TATTCTTTGT ACTGCATTGG GTGAAGCCAC 
GAAAAAACAA TTTCTACTTT TCTAGTGATT 
CTAATTTTTA ATTAGAGAAG ATTTTCAATA 
TCGGATTCTC CGGCCTCTAG CTTCGCCCGA 
AAGGATTTAA GTAGAACTGC TTGTGGTAAT 
TTATATAATC ACACCATCTA CCAGTTGAAA 
CAATGTTTCT CACGCTTCAC TTAGCATGTG 
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CCACTGTCTC GCTCACCCCG CAGGCTGTCT 840 

CTGCTACCAT CCTCCGCTTT CCGCCAAACT 900 

GCAACTGCAG CAACATCGGC GTCGCGCAGA 960 

CCGCTCGCCC CCACTTAGGC GGCGGCGGCC 1020 

CCGCGGCTGC ATCGGCCGCC GCCGCCGCCT 1080 

ACGCTAAGGT TGCGCAACCG TCCGCCGTCG 1140 

ACGCCAGCCT CGCCGTCCAC GCGGGTACCC 1200 

ATCTAGGTTC TATGCTTTTT TCCCCTTTCG 1260 

ATCTGAAACA GGGGAGAGGC TGGGAAGAAG 1320 

AGTGAACACG TCGGCCTACT GGTTCAACAA 1380 

GAATATTCGT GCTTGCTCTT GTCTAATTTG 1440 

AGTTTTATCT GTTCCTTAGG AGGGGAGGCA 1500 

CCCGACCACG GAGGCATTAG AGAAGAAGAT 1560 

ACCCTGTTGG TTGCATTTGG CTGGAGGCTA 1620 

CTGGAGAAAG CAGAGTCCAC AGTGTT06TG 1680 

CTCAGTOCAC TTGTTCCGGC TGGTGGGCAC 1740 

ACAAGGATTT ACATGGAAAC TGAGCTCCCC 1800 

CTTTTAAGCT CTACTTGTTT TTAGAACGGG 1860 

GTCACTGTGC TACAGTAGTG GGTCTACAAT 1920 

TGTCGTGTTG TCCTTATCTA GCTAATAGTC 1900 

AATAGQAGTA GCATAACTAC TGCAGGGTAT 2040 

GAGTACACAG TACAGTATGG CTTCAGACTT 2100 

ATAGGGTTTG CCGAGTGCAC 6TGCACCAGG 2160 

AAAAACTAAA TTTTACCACT CATGCACACC 2220 

CATGTGTATA TTGAAATGTC AAGTGTGCAC 2280 

CTGCAATGTC AATAGGATTG GCTATCTGTA 2340 

AAATTTTAGG ATCCCTCACA ATAAGATTTA 2400 

TGCAGTGAGA GCACTTTGTG AGTTGTATAC 2460 

ATACTGTTTA TGCTCAGATG ACTGTCATTA 2520 
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GGCCTGCTGA 


CATGGATGCT 


CTACAAAATG 


CGTTGGACAA 


CAATAATGTG 


AGTGTGGTAT 


2580 


CATTTCCATT 


GCCCCTGATC 


GTGGTAAAAA 


ACATACATTA 


ATACATTTGC 


AAATGTAGCC 


2640 


TAACCTTATG 


GCCATGTCAG 


GTATCTCTTT 


TCTTCACGGA 


GACTCCCACA 


AATCCATTTC 


2700 


TCAGATGCAT 


TGATATTGAA 


CATGTATCAA 


ATATGTGCCA 


TAGCAAGGGA 


GCGTTGCTTT 


2760 


GTATCGACA6 


TACTTTTGCC 


TCCCCTATCA 


ATCAGAAGGC 


ACTGACTTTA 


GGCGCTGACC 


2820 


TAGTTATTCA 


TTCTGCAACA 


AAGTACATTG 


CTGGACACAA 


CGATGTGAGT 


TGATATACTG 


2680 


AACCCCATCT 


CCCCTCATTA 


AAGTTATGTG 


TTTGCACATT 


GCACTAACTA 


GTACTTCAAC 


2940 


TTCCCAGGTT 


ATTGGAGGAT 


GCGTCAGTGG 


CAGAGATGAG 


TTGGTTTCCA 


AAGTCCGTAT 


3000 


TTATCACCAT 


GTGGTTGGTG 


GTGTTCTAAA 


CCCGGTAAGT 


TTAGATTGTT 


AAAGTTTTGT 


3060 


TTCCATTTAT 


TTCArCTTCC 


TTGCACAGGT 


TGTATGTATT 


TACAGATTCC 


CATAGTTACA 


3120 


AGCTTCTATr 


TTTATAGGTA 


GAAAATCGTG 


TAATTTTCTT 


TAGTAGCATA 


TGTTTAGGTT 


3180 


AGAAAAATAA 


TTTGCTTTCT 


CTGAGTATCA 


CAAACCGCAT 


CCAGTTCTCT 


GTTACATGAA 


3240 


CTAG^TTCT 


GGTTCTGGAA 


AGGAAGAAAT 


AGGATATGTT 


CTGTGCACTG 


CAATATATAT 


3300 


CTAATCATTA 


ATCCGGAGCT 


TTATGTCACA 


6ACTCACAGG 


CCAGGCTACC 


ACTTTATGAA 


3360 


ATATTCCAAA 


TTATGCTTGT 


CTCAAAATGG 


AATGACTCAT 


GrTGTACTCT 


6TTCCAAC6T 


3420 


TTTCAAATCA 


TGACTAGGAT 


TCTAGTTGCC 


CGGACACCGA 


CTAGGTGATT 


AATCGTGACT 


3480 


AGGCATTGAC 


TAGTCACGAT 


TAGTTTTGAG 


CTAGTCGAAC 


TTATCAACAA 


CTTGTTCCAG 


3540 


GCAATATATT 


GCAGTACTAT 


GCCTTATTGA 


TTGGGTATAT 


AAATGAATTT 


TAGCACACAG 


3600 


ATAGAGCAGA 


AGTAAGACAA 


ATTAACACAA 


AGTTCTAGA 






3639 


<2) 


INFORMATION 


FOR SEQ ID 


NO:27: 









(1) SEQUENCB CHARACTERISTICS: 

(A) LENGTH: 509 amino acids 

(B) TITPB: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Ala Thr Val Ser Leu Thr Pro Gin Ala Val Phe Ser Thr Glu Ser 
1 5 10 15 

Gly Gly Ala Leu JUa Ser Ala Tnr lie Leu Arg Plie Pro Pro Asn Phe 
20 25 30 
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Val Arg Gin Leu Ser Thr Lys Ala Arg Arg Asn Cys Ser Asn lie Gly 
35 40 45 

Val Ala Gin lie Val Ala Ala Ala Trp Ser Asp Cys Pro Ala Ala Arg 
50 55 60 

Pro His Leu Gly Gly Gly Gly Arg Arg Ala Arg Gly Val Ala Ser Ser 
65 70 75 80 

His Ala Ala Ala Ala Ser Ala Ala Ala Ala Ala Ser Ala Ala Ala Glu 
85 90 95 

Val Ser Ala He Pro Asn Ala Lys Val Ala Gin Pro Ser Ala Val Val 
100 105 110 

Leu Ala Glu Arg Asn Leu Leu Gly Ser Asp Ala Ser Leu Ala Val His 
115 120 125 

Ala Gly Glu Arg Leu Gly Arg Arg He Ala Thr Asp Ala He Thr Thr 
130 135 140 

Pro Val Val Asn Thr Ser Ala Tyr Trp Phe Asn Asn Ser Gin Glu Leu 
145 150 155 160 

He Asp Phe Lys Glu Gly Arg His Ala Ser Phe Glu Tyr Gly Arg Tyr 
165 170 175 

Gly Asn Pro Thr Thr Glu Ala Leu Glu Lys Lys Met Ser Ala Leu Glu 
180 185 190 

Lys Ala Glu Ser Thr Val Phe Val Ala Ser Gly Met Tyr Ala Ala Val 
195 200 205 

Ala Met Leu Ser Ala Leu Val Pro Ala Gly Gly His He Val Thr Thr 
210 215 220 

Thr Asp Cys Tyr Arg Lys Thr Arg He Tyr Met Glu Asn Glu Leu Pro 
225 230 235 240 

Lys Arg Gly He Ser Met Thr Val He Arg Pro Ala Asp Met Asp Ala 
245 250 255 

Leu Gin Asn Ala Leu Asp Asn Asn Asn Val Ser Leu Phe Phe Thr Glu 
260 265 270 

Thr Pro Thr Asn Pro Phe Leu Arg Cys He Asp He Glu His Val Ser 
275 280 285 

Asn Met Cys His Ser Lys Gly Ala Leu Leu Cys He Asp Ser Thr Phe 
290 295 300 

Ala Ser Pro He Asn Gin Lys Ala Leu Thr Leu Gly Ala Asp Leu Val 
305 310 315 320 

He His Ser Ala Thr Lys Tyr He Ala Gly His Asn Asp Val He Gly 
325 330 335 
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Gly Cys Val Ser Gly Arg Asp Glu Leu Val Ser Lys Val Arg lie Tyr 
340 345 350 

His His Val Val Gly Gly Val Leu Asn Pro Asn Ala Ala Tyr Leu II 
355 360 365 

Leu Arg Gly Met Lys Thr Leu His Leu Arg Val Gin Cys Gin Asn Asp 
370 375 380 

Tlir Ala Leu Arg Met Ala Gin Phe Leu Glu Glu His Pro Lys lie Ala 
385 390 395 400 

Arg Val Tyr Tyr Pro Gly Leu Pro Ser His Pro Glu His His lie Ala 
405 410 415 

Lys Ser Gin Met Thr Gly Phe Gly Gly Val Val Ser Phe Glu Val Ala 
420 425 430 

Gly Asp Phe Asp Ala Thr Arg Lys Phe lie Asp Ser Val Lys lie Pro 
435 440 445 

Tyr His Ala Pro Ser Phe Gly Gly Cys Glu Ser lie lie Asp Gin Pro 
450 455 460 

Ala lie Met Ser Tyr Trp Asp Ser Lys Glu Gin Arg Asp lie Tyr Gly 
# € S 470 475 480 

lie Lys Asp Asn Leu lie Arg Phe Ser lie Gly Val Glu Asp Phe Glu 
485 490 495 

Asp Leu Lys Asn Asp Leu Val Gin Ala Leu Glu Lys lie 
500 505 
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What is claimed is: 

1. An isolated nucleic acid fragment encoding a plant cystathionine 
7-synthase. 

2. The isolated nucleic acid fragment of Claim 1 encoding a com 
5 cystathionine y-synthase. 

3.. An isolated nucleic acid fragment comprising 

(a) the first nucleic acid firagment of Claim 1 ; and 

(b) a second nucleic acid fragment encoding aspanokinase which is 
insensidve to end-product inhibition. 

10 4. The nucleic acid fragment of Qaim 3» wherein either: 

(a) the first nucleic acid fragmmt is derived from com; or 

(b) the second nucleic acid fragment comprises a nucleotide 
sequence essentially similar to the sequence shown in SEQ ID NO:4 encoding 
jE. coli AKDI, said nucleic acid fragment encoding a lysine-insensitive variant of 

IS E. coli AKm and further characterized in that at least one of the following 
conditions is met: 

(1) the amino acid at position 318 is an amino acid other tiian 
threomne; or 

(2) the amino acid at position 352 is an amirK> acid other tiian 
20 metfiimine. 

5. An isolated nucleic acid fragment comprising 

(a) the funstiiucleic acid fragment of Claim 1; and 

(b) a second nucleic acid fragment encoding a bi-fimctional protein 
widi aspartokinase and hcnnoserine dehydrogenase activities both of which are 

25 insensitive to end-product inhibiticm. 

6. Hie nucleic acid fragment of Qaim 5, wherein either: 

(a) die first nucleic acid fragment is denved from com or 

(b) the second nucleic acid fragment comprises a nucleotide 
sequence essratially similar to the E. metL gene. 

30 7. A chimeric gene wherein the nucleic acid fragment of Claim 1 is 

operably linked to a seed-specific regulatory sequence. 
8 . A nucleic acid fragment comprising 
(a) the chimeric gene of Qaim 7 and 
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(b) a second chimeric gene wherein a nucleic acid fragment encoding 
^artokinase \iiiich is insensitive to end-product inhibition is operably linked to a 
plant chloFoplast transit sequence and to a seed-specific regulatory sequence. 
9. A nucleic acid fragment comprising 
5 (a) the first chimeric gene of Claim 7 and 

(b) a second chimeric gene wherein a nucleic acid fragment encoding 
a bi-functional protein widi aspanokinase and homoserine dehydrogenase 
activities, both of which are insensitive to end-product inhibition, is operably linked 
to a plant chloroplast transit sequmce and to a seed-specific regulatory sequence. 
10 10. A plant comprising in its genome the chimeric gene of Claim 7 or the 

rmcleic acid fragment of Claim 8 or Claim 9. 

1 1 . Seeds containing the chimeric gene of Claim 7 or the nucleic acid 
fragment of Claim 8 or Claim 9 obtained fixm the plant of Claim 10. 

12. A method for increasing the methionine content of the seeds of plants 
15 comprising: 

(a) tranrforming plant cells with the chimeric gene of Qaim 7 or the 
nudeic add fragment of Claim 8 or Claim 9; 

(b) growing fertile mature plants frmt tbc transformed plant cells 
obtained from step (a) under conditions suitaUe to obtain seeds; and 

20 (c) selecting from the progeny seed of step (b) for those seeds 

containing increased levels of metfiionine conq)aied to untransformed seeds. 

13. A plant conqxrising in its genome 

(a) a first nudeic acid fragment of Claim 8 or Claim 9 or a first 
diimeric gene of Claim 7 and 
25 (b) a chimeric gene wherein a nucleic acid fragment encoding a 

metfiionine-rich protein, wherein the weight percent methionine is at least 1 5%, is 
operably linked to a seed-specific regulatory sequence. 

14. A nudeic acid fragment comprising 

(a) a first nudeic acid fragment of Claim 8 or Claim 9 or a first 
30 diimeiic gene of Qaim 7 and 

(b) a chimeric gene wherein a nucleic acid fragment encoding a 
methionine-rich protein, wherein die weight percent methionine is at least 15%, is 
operably linked to a seed-specific regulatory sequence. 
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15. A plant comprising in its genome die nucleic acid fragment of 
Claim 14. 

16. Seeds obtained from the plant of Claim 13 or Claim IS and 
containing either 

5 (a) a first nucleic acid fragment of Oaim 8 or Claim 9 or a first 

chimeric gene of Claim 7 and 

(b) a chimeric gene wherein a nucleic acid fragment encoding a 
methionine-rich protein, wherein the weight percent methionine is at least 15%, is 
operably linked to a seed-specific regulatory sequence, or 

10 (c) the nucleic acid fragment of Claim 14. 

17. A method for increasing the metfiionine content of the seeds of plants 
comprising: 

(a) transfonning plant cells with the nucleic acid fragment of 

Oaim 14; 

15 (b) growing feitUe mature plants frxnn the transformed plant cells 

obtained from step (a) imder conditions suitable to obtain seeds; and 

(c) selecting from the progeny seed of step (b) those seeds 
containing increased levels of methionine compared to untransfoimed seeds* 

18. A diimeric gene wherein the nucleic acid fiiigment of Claim 1 is 
20 operably linked to a regulatory sequence capable of e3q)ression in microbial cells. 

19. A mediod for fnroducing plant cystathionine gamma synthase 
comprising: 

(a) transfonning a microbial host cell with the chimeric goie of 

Oaim 18; 

25 (b) growing the transformed microbial cells obtained from step (a) 

under conditions that result in the expression of plant cystathionine gamma 
synthase protein. 

20. A nucleic acid fragment essentially similar to that described by 
SEQIDNOil. 

30 21 . A nucleic acid fragment essentially similar to that described by 

SEQIDNO:26. 
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